JP7776221B2

JP7776221B2 - Location-based semantic similarity platform

Info

Publication number: JP7776221B2
Application number: JP2024556330A
Authority: JP
Inventors: マッケンジー，グラント; バタースビー，サラ，イー．; セトラー，ヴィジャ，ラガヴァン
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-11-15
Filing date: 2022-11-15
Publication date: 2025-11-26
Anticipated expiration: 2042-11-15
Also published as: EP4433911A1; WO2023087032A1; JP2024545364A

Description

［関連出願への相互参照］
本出願は、（ｉ）２０２１年１１月１５日に出願された「Place-Based Semantic Similarity Platform」と題する米国仮特許出願第６３／２７９，６６７号、および（ｉｉ）２０２１年１２月２日に出願された「Place-Based Semantic Similarity Platform」と題する米国仮特許出願第６３／２８５，４７６号の優先権を主張する、２０２２年１１月１４日に出願された「Place-Based Semantic Similarity Platform」と題する米国特許出願第１７／９８６，８２２号の継続出願であり、これらの各々は、その全体が参照により本明細書に組み込まれる。 CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation of U.S. Patent Application No. 17/986,822, entitled "Place-Based Semantic Similarity Platform," filed November 14, 2022, which claims priority to (i) U.S. Provisional Patent Application No. 63/279,667, entitled "Place-Based Semantic Similarity Platform," filed November 15, 2021, and (ii) U.S. Provisional Patent Application No. 63/285,476, entitled "Place-Based Semantic Similarity Platform," filed December 2, 2021, each of which is incorporated by reference in its entirety herein.

［技術分野］
開示される実装形態は、一般に、データ視覚化に関し、より詳細には、場所ベースの意味的類似性を提供するシステム、方法、およびユーザインターフェースに関する。 [Technical Field]
FIELD OF THE INVENTION The disclosed implementations relate generally to data visualization, and more particularly to systems, methods, and user interfaces that provide location-based semantic similarity.

物体間の類似性は直感的に定義される。木と潅木は両方とも植物であるので類似している。一方で、木と共同住宅はどちらも高さを使用して説明されることが多いが、似て非なるものである。その核心において、物事を類似させるのが何であるか理解することは、非常に複雑かつ微妙である。例えば、研究者は、類似性の概念を、特徴と、人々が類似性を個々に理解し評価する方法とに分解することを目的として、類似性の概念を研究してきた。 Similarity between objects is defined intuitively: trees and shrubs are similar because they are both plants. On the other hand, trees and apartment buildings are different, even though both are often described using their height. At its core, understanding what makes things similar is highly complex and nuanced. For example, researchers have studied the concept of similarity, aiming to decompose it into characteristics and the ways in which people individually understand and evaluate similarity.

木、潅木、および共同住宅のような物体間の比較は、明白で直感的な評価のように見えるかもしれないが、人種、年齢、および収入のような多様な社会人口統計学的特性を扱う場合、類似性を識別することは容易ではない。それはまた、コンテキストと、個人が類似性を解釈する上で何が重要であるかにも依存する。これらの特性の観点から、米国のどの地域がサンフランシスコに最も類似しているのだろうか、そしてどのように類似しているのだろうか？人種構成の観点から、シカゴのどの地区（neighborhood）がニューヨーク州ブロンクスに最も類似しているのだろうか？ Comparisons between objects like trees, shrubs, and apartment buildings may seem like obvious, intuitive assessments, but when dealing with diverse sociodemographic characteristics like race, age, and income, identifying similarities is not easy. It also depends on the context and what is important to an individual in interpreting similarity. In terms of these characteristics, which neighborhood in the United States is most similar to San Francisco, and how? In terms of racial composition, which neighborhood in Chicago is most similar to the Bronx, New York?

他のどのロケーションが地区に似ているのだろうか？どのように似ているのだろうか？なぜ似ているのだろうか？多くの空間分析の中心は、ロケーション間の類似性または非類似性を見つけることである。パターンを発見し、類似性を解釈することは、空間特性と、場所に割り当てられたセマンティクスまたは意味との両方に基づく複雑なプロセスである。ロケーションについての類似性の人間による概念化は多面的であり、人口密度または平均収入のような単一の数値属性の単純な評価では捉えることはできない。しかしながら、これらの定量化可能な属性は、意味付けの最初のパスの基礎となる。 What other locations are similar to a neighborhood? How similar? Why similar? Central to much spatial analysis is finding similarities or dissimilarities between locations. Discovering patterns and interpreting similarities is a complex process based on both spatial properties and the semantics or meanings assigned to places. Human conceptualizations of location similarity are multifaceted and cannot be captured by a simple assessment of a single numerical attribute like population density or average income. However, these quantifiable attributes form the basis for a first pass at semantic analysis.

社会経済的変数および人口統計学的変数を使用して類似性を測定する際の１つの困難は、利用可能なデータが膨大かつ多様であることである。従来の人口統計学的研究では、研究者は、平均収入または年齢などのいくつかの単純な変数を厳選し、それらを統計における独立変数として使用して、相関を識別し得る。時には、研究者は、すべての可能な地理的ロケーションを、それらのロケーションの間で値が高いか低いかという観点から比較することで、一度に１つの属性を調べる（例えば、国勢統計区Ａは国勢統計区Ｂより人口が１０％多い）。しかしながら、これらの方法はいずれも、潜在的に大きな人口統計学的変数のグループにまたがるデータの関係を使用しない。 One difficulty in measuring similarity using socioeconomic and demographic variables is the vastness and diversity of the data available. In traditional demographic studies, researchers may select a few simple variables, such as average income or age, and use them as independent variables in statistics to identify correlations. Sometimes, researchers examine one attribute at a time by comparing all possible geographic locations in terms of whether values are high or low between those locations (e.g., census tract A has 10% more people than census tract B). However, neither of these methods utilizes data relationships across potentially large groups of demographic variables.

したがって、情報の削減および／また意味的一般化を提供するために、類似性尺度および空間分析の組み込みを容易にするシステム、方法、およびインターフェースが必要とされている。本明細書で説明される技法は、ユーザを実用的な洞察に近づけるのに役立つ。この技法は、地域間の類似性を決定するための地理空間的な問い合わせにおいて使用することができ、参加者は、これらのロケーションを記述する様々な属性の個々の重みを操作することができる。いくつかの実装形態は、類似性を計算するためにコンテキストおよび追加の場所固有パラメータを使用する。いくつかの実装形態は、場所類似性のためにセマンティクスのニュアンスを利用する地理空間分析ツールを提供する。 Therefore, there is a need for systems, methods, and interfaces that facilitate the incorporation of similarity measures and spatial analysis to provide information reduction and/or semantic generalization. The techniques described herein help move users closer to actionable insights. The techniques can be used in geospatial queries to determine similarities between regions, allowing participants to manipulate the individual weights of various attributes that describe these locations. Some implementations use context and additional location-specific parameters to calculate similarity. Some implementations provide geospatial analysis tools that exploit semantic nuances for location similarity.

いくつかの実装形態は、統計的アプローチを使用して、地理的地域（例えば、米国内の地域）間の類似性を決定する。いくつかの実装形態は、ユーザがこのタイプの類似性尺度をユーザの分析に組み込むことを容易にするデータハブを提供する。本明細書で説明される技法によるフレームワークは、人々が、ユーザが関心を持っている属性を使用して、より類似しているまたはあまり類似していないロケーションを識別するために、米国国勢調査からの多様な属性を扱うことを容易にする。いくつかの実装形態は、ジェンセンシャノン情報量（ＪＳＤ：Jensen-Shannon Divergence）に基づく計算を使用して類似性を決定し、および／または読みやすいマップで結果を提示する。いくつかの実装形態は、ツールチップにおいてオンデマンドで詳細を示す。データ分析のために類似性を評価するためのＪＳＤの使用について、いくつかの実装形態にしたがって、以下で詳細に説明する。 Some implementations use statistical approaches to determine similarity between geographic regions (e.g., regions within the United States). Some implementations provide a data hub that makes it easy for users to incorporate this type of similarity measure into their analyses. The techniques described herein provide a framework that makes it easy for people to work with diverse attributes from the U.S. Census to identify more or less similar locations using attributes that interest the user. Some implementations use calculations based on the Jensen-Shannon Divergence (JSD) to determine similarity and/or present the results in an easy-to-read map. Some implementations provide details on demand in a tooltip. The use of JSD to assess similarity for data analysis is described in more detail below, according to some implementations.

いくつかの実装形態によれば、データセットの視覚的分析のための方法が提供される。この方法は、コンピュータシステムで実行される。ユーザはデータソースを選択する。これに応答して、システムは、データソース内のデータの分析のためのグラフィカルユーザインターフェースを提示する。データは地理空間データポイントを含む。システムはまた、グラフィカルユーザインターフェース内にマップデータ視覚化を提示する。マップデータ視覚化は複数の地理的地域を含む。各地理的地域はそれぞれの１つまたは複数の地理空間データポイントに対応する。複数の地理的地域のうちの１つまたは複数の地理的地域の第１のセットを選択するための第１のユーザ入力を受信したことに応答して、システムは、１つまたは複数の統計的技法を使用して、属性（例えば、データソースからのデータフィールド）のセットに基づいて、１つまたは複数の地理的地域の第１のセットと複数の地理的地域のうちの１つまたは複数の地理的地域の第２のセットとの間の類似性を計算する。次いで、システムは、計算された類似性にしたがってマップデータ視覚化を更新して表示する。 According to some implementations, a method for visual analysis of a dataset is provided. The method is executed on a computer system. A user selects a data source. In response, the system presents a graphical user interface for analysis of data in the data source. The data includes geospatial data points. The system also presents a map data visualization within the graphical user interface. The map data visualization includes a plurality of geographic regions, each corresponding to a respective one or more geospatial data points. In response to receiving a first user input for selecting a first set of one or more geographic regions from the plurality of geographic regions, the system calculates similarities between the first set of one or more geographic regions and a second set of one or more geographic regions from the plurality of geographic regions based on a set of attributes (e.g., data fields from the data source) using one or more statistical techniques. The system then updates and displays the map data visualization according to the calculated similarities.

いくつかの実装形態では、属性のセットは、１つまたは複数の社会経済的、人口統計学的、および地理的変数を含む。 In some implementations, the set of attributes includes one or more socioeconomic, demographic, and geographic variables.

いくつかの実装形態では、マップデータ視覚化を更新することは、１つまたは複数の地理的地域の第２のセットのうちの少なくとも１つの地理的地域をハイライト表示またはローライト表示することを含む。 In some implementations, updating the map data visualization includes highlighting or lowlighting at least one geographic region of the second set of one or more geographic regions.

いくつかの実装形態では、方法は、マップデータ視覚化上の検索多角形の座標を選択するための第２のユーザ入力を受信したことに応答して、座標に基づいて第２の１つまたは複数の地域を定義することをさらに含む。 In some implementations, the method further includes, in response to receiving a second user input for selecting coordinates of the search polygon on the map data visualization, defining a second region or regions based on the coordinates.

いくつかの実装形態では、方法は、１つまたは複数の地理的地域の第２のセットを識別するために、検索多角形の座標を、複数の地理的地域のうちの地理的地域の各々についての対応する１つまたは複数の地理空間データポイントと比較することをさらに含む。 In some implementations, the method further includes comparing the coordinates of the search polygon to one or more corresponding geospatial data points for each of the geographic regions of the plurality of geographic regions to identify a second set of one or more geographic regions.

いくつかの実装形態では、属性のセットの各属性は、複数の重みのうちの対応する重みに関連付けられ、方法は、複数の重みに基づいて類似性を計算することをさらに含む。 In some implementations, each attribute in the set of attributes is associated with a corresponding weight from the plurality of weights, and the method further includes calculating the similarity based on the plurality of weights.

いくつかの実装形態では、方法は、１つまたは複数のアフォーダンスを提供することをさらに含み、各アフォーダンスは、属性のセットのそれぞれの属性に対応する。 In some implementations, the method further includes providing one or more affordances, each affordance corresponding to a respective attribute of the set of attributes.

いくつかの実装形態では、方法は、１つまたは複数のアフォーダンスのうちの第１のアフォーダンスを選択するための第２のユーザ入力を受信したことに応答して、（ｉ）第１のアフォーダンスに対応する第１の属性に対する第１の重みを調整して、重みの更新されたセットを取得することと、（ｉｉ）１つまたは複数の統計的技法を使用して、重みの更新されたセットに基づいて、１つまたは複数の地理的地域の第１のセットと１つまたは複数の地理的地域の第２のセットとの間の更新された類似性を計算することと、（ｉｉｉ）更新された類似性にしたがってマップデータ視覚化を更新して表示することとをさらに含む。 In some implementations, the method further includes, in response to receiving a second user input to select a first affordance of the one or more affordances, (i) adjusting first weights for the first attribute corresponding to the first affordance to obtain an updated set of weights, (ii) calculating, using one or more statistical techniques, updated similarities between the first set of one or more geographic regions and the second set of one or more geographic regions based on the updated set of weights, and (iii) updating and displaying the map data visualization according to the updated similarities.

いくつかの実装形態では、方法は、重みの更新されたセットを記憶するためのストアアフォーダンスを提供することをさらに含む。ユーザがストアアフォーダンスを選択したことに応答して、方法は、次のセッションのために、重みの更新されたセットをプリセットファイルに記憶する。 In some implementations, the method further includes providing a store affordance for storing the updated set of weights. In response to the user selecting the store affordance, the method stores the updated set of weights in a preset file for the next session.

いくつかの実装形態では、方法は、次のセッションのために、プリセットファイルを取り出し、１つまたは複数の地理的地域の第１のセットと１つまたは複数の地理的地域の第２のセットとの間の類似性を計算するために重みの更新されたセットを使用することをさらに含む。 In some implementations, the method further includes retrieving the preset file and using the updated set of weights to calculate similarity between the first set of one or more geographic regions and the second set of one or more geographic regions for a next session.

いくつかの実装形態では、マップデータ視覚化はコロプレスマップであり、計算された類似性にしたがってマップデータ視覚化を更新して表示することは、最大類似性から最小類似性への勾配を表示することを含む。 In some implementations, the map data visualization is a choropleth map, and updating and displaying the map data visualization according to the calculated similarity includes displaying a gradient from maximum similarity to minimum similarity.

いくつかの実装形態では、方法は、（ｉ）コロプレスマップを選択するための第１のアフォーダンスと、最大－最小マップ（most-least map）を選択するための第２のアフォーダンスとを提供することと、（ｉｉ）第１のアフォーダンスのユーザ選択に応答して、最大類似性から最小類似性への勾配を表示することと、（ｉｉｉ）第２のアフォーダンスのユーザ選択に応答して、最も類似している地域および最も類似していない地域を表示することとをさらに含む。 In some implementations, the method further includes (i) providing a first affordance for selecting a choropleth map and a second affordance for selecting a most-least map; (ii) displaying a gradient from most similarity to least similarity in response to user selection of the first affordance; and (iii) displaying the most similar and least similar regions in response to user selection of the second affordance.

いくつかの実装形態では、方法は、（ｉ）複数のアフォーダンスを提供することであって、各アフォーダンスはそれぞれの最大数の地域に対応する、提供することと、（ｉｉ）複数のアフォーダンスのうちの１つのアフォーダンスのユーザ選択に応答して、アフォーダンスに対応する地域の最大数に基づいて、１つまたは複数の地域の第２のセット内の最も類似している地域および最も類似していない地域を表示することとをさらに含む。 In some implementations, the method further includes (i) providing a plurality of affordances, each affordance corresponding to a respective maximum number of regions; and (ii) in response to a user selection of one of the plurality of affordances, displaying a most similar region and a least similar region within a second set of one or more regions based on the maximum number of regions corresponding to the affordance.

いくつかの実装形態では、方法は、（ｉ）複数のアフォーダンスを提供することであって、各アフォーダンスは複数の小地域のうちの小地域のそれぞれのサブセットに対応する、提供することと、（ｉｉ）複数のアフォーダンスのうちのアフォーダンスのユーザ選択に応答して、（ａ）マップデータ視覚化の提示を中止することと、（ｂ）グラフィカルユーザインターフェース内に代替のマップデータ視覚化を提示することとをさらに含む。代替のマップデータ視覚化は、アフォーダンスに対応する小地域のサブセットを含む。 In some implementations, the method further includes (i) providing a plurality of affordances, each affordance corresponding to a respective subset of the sub-regions of the plurality of sub-regions; and (ii) in response to a user selection of an affordance of the plurality of affordances, (a) ceasing presentation of the map data visualization and (b) presenting an alternative map data visualization within the graphical user interface. The alternative map data visualization includes the subset of the sub-regions corresponding to the affordance.

いくつかの実装形態では、グラフィカルユーザインターフェースは第１の部分および第２の部分を含み、方法は、（ｉ）マップデータ視覚化を第１の部分に表示することと、（ｉｉ）第１の１つまたは複数の地理的地域と第２の１つまたは複数の地理的地域との間の類似性の要約を第２の部分に表示することとをさらに含む。 In some implementations, the graphical user interface includes a first portion and a second portion, and the method further includes (i) displaying a map data visualization in the first portion and (ii) displaying a summary of similarities between the first one or more geographic regions and the second one or more geographic regions in the second portion.

いくつかの実装形態では、地理的地域の各々は、それぞれの国勢統計区に対応する。 In some implementations, each geographic region corresponds to a respective census tract.

いくつかの実装形態では、類似性を計算することは、属性のセットについて、複数の地理的地域のうちの１つまたは複数の地理的地域の第１のセットおよび１つまたは複数の地理的地域の第２のセットのための意味的類似性行列を計算することを含む。 In some implementations, calculating the similarity includes calculating a semantic similarity matrix for a first set of one or more geographic regions and a second set of one or more geographic regions of the plurality of geographic regions for the set of attributes.

いくつかの実装形態では、類似性を計算することは、１つまたは複数の地理的地域の第１のセットおよび１つまたは複数の地理的地域の第２のセットの地理的地域のペア間のジェンセンシャノン情報量（ＪＳＤ）を計算することを含む。 In some implementations, calculating the similarity includes calculating Jensen-Shannon Divergence (JSD) between pairs of geographic regions of the first set of one or more geographic regions and the second set of one or more geographic regions.

別の態様では、電子デバイスは、１つまたは複数のプロセッサと、メモリと、ディスプレイと、メモリに記憶された１つまたは複数のプログラムとを含む。プログラムは、１つまたは複数のプロセッサによって実行されるように構成され、本明細書で説明される方法のいずれかを実行するように構成される。 In another aspect, an electronic device includes one or more processors, a memory, a display, and one or more programs stored in the memory. The programs are configured to be executed by the one or more processors and to perform any of the methods described herein.

別の態様では、非一時的コンピュータ可読記憶媒体は、１つまたは複数のプロセッサと、メモリと、ディスプレイとを有するコンピューティングデバイスによって実行されるように構成された１つまたは複数のプログラムを記憶する。１つまたは複数のプログラムは、本明細書で説明される方法のいずれかを実行するように構成される。 In another aspect, a non-transitory computer-readable storage medium stores one or more programs configured to be executed by a computing device having one or more processors, a memory, and a display. The one or more programs are configured to perform any of the methods described herein.

したがって、ユーザがデータ視覚化アプリケーション内に表示されるデータを効率的に探索することを可能にする方法、システム、およびグラフィカルユーザインターフェースが開示される。 Accordingly, methods, systems, and graphical user interfaces are disclosed that enable users to efficiently explore data displayed within a data visualization application.

前述の概要および以下の詳細な説明の両方は、例示的および説明的であり、特許請求されるような本発明のさらなる説明を提供することを意図するものである。 Both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.

前述のシステム、方法、およびグラフィカルユーザインターフェース、ならびにデータ視覚化分析を提供する追加のシステム、方法、およびグラフィカルユーザインターフェースのより良好な理解のために、同様の参照番号が図全体を通して対応する部分を指す以下の図面と併せて、以下の発明を実施するための形態を参照されたい。
図１Ａ－１および図１Ａ－２は、類似性を使用した場合と対比して、特定のロケーションにおける異なる人種の相対的な割合を表示するマップを示す。いくつかの実装形態による、ジェンセンシャノン距離を使用する複数の属性を使用して類似性を示すマップの一例である。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、例示的なチャートを示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、ヘルプメニューを有するグラフィカルユーザインターフェースを示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、マップを表示するグラフィカルユーザインターフェースの例を示す。いくつかの実装形態による、コンピューティングデバイスのブロック図である。いくつかの実装形態による、類似性値を計算するための例示的なプロセスの概略図を示す。いくつかの実装形態による、例示的なユーザインターフェースを示す。いくつかの実装形態による、類似性データのための異なる地図形式視覚化を示す。いくつかの実装形態による、類似性データのための異なる地図形式視覚化を示す。いくつかの実装形態による、例示的なユーザインターフェースを示す。いくつかの実装形態による、データの視覚的分析のための例示的な方法のためのフローチャートを示す。 For a better understanding of the aforementioned systems, methods, and graphical user interfaces, as well as additional systems, methods, and graphical user interfaces that provide data visualization analysis, please refer to the following detailed description in conjunction with the following drawings, in which like reference numerals refer to corresponding parts throughout the figures.
1A-1 and 1A-2 show maps displaying the relative proportions of different races in specific locations versus using similarity. 1 is an example of a map showing similarity using multiple attributes using Jensen-Shannon distance, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 shows an example chart according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates a graphical user interface with a help menu, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. 1 illustrates an example of a graphical user interface displaying a map, according to some implementations. FIG. 1 is a block diagram of a computing device according to some implementations. 1 shows a schematic diagram of an exemplary process for calculating a similarity value, according to some implementations. 1 illustrates an exemplary user interface, according to some implementations. 1 illustrates different map-style visualizations for similarity data according to some implementations. 1 illustrates different map-style visualizations for similarity data according to some implementations. 1 illustrates an exemplary user interface, according to some implementations. 1 shows a flowchart for an exemplary method for visual analysis of data, according to some implementations.

次に、例が添付の図面に示されている実装形態を参照する。以下の説明では、本発明の完全な理解を提供するために、数多くの具体的な詳細が記載されている。しかしながら、本発明がこれらの具体的な詳細を必要とせずに実施され得ることは、当業者には明らかであろう。 Reference will now be made to the implementations, examples of which are illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

推定によると、ビジネスデータセットの８０％超に空間成分（例えば、住所、緯度／経度、州、または国）が含まれている。ビジネスデータとロケーションとの強い関係により、ユーザは、データ内の空間パターンを中心に質問および探索を行うことが多い。これらのユーザクエリおよび対話の多くは、絶対的なロケーション（例えば、「何人の顧客がカリフォルニア州にいるか？」）に結び付いているが、探索のための広範囲にわたる重要な質問および手段は、場所のセマンティクスにより適合したシステムにおけるさらなる柔軟性から恩恵を受けるであろう。これは、ビジネス関連の質問だけに当てはまるわけではない。多くの意思決定の機会には、コンテキストを提供するためのロケーション間の関係の評価が含まれる。ロケーション間の関係は、それ自体が答え（例えば、「どの場所がこの場所に似ているか」）であってもよいし、関係は、より大規模な分析プロセスにおける予備ステップであってもよい（例えば、「どの場所がこの場所に似ているかとすることで、学区バス送迎政策を評価する際にこれらのロケーションを使用することができる」）。このタイプの質問の鍵は類似性である。しかしながら、類似性を定量化することは困難である。ロケーションは、単に属性の数ではなく、人々がロケーション間の関係を理解する方法は、それらのロケーションに与えられている文字または意味論的意味に強く結び付けられる。 Estimates suggest that over 80% of business datasets contain a spatial component (e.g., address, latitude/longitude, state, or country). The strong relationship between business data and location often leads users to base their questions and exploration around spatial patterns in the data. While many of these user queries and interactions are tied to absolute location (e.g., "How many customers are in California?"), a wide range of important questions and avenues for exploration would benefit from the added flexibility of a system better adapted to location semantics. This is not just true for business-related questions. Many decision-making opportunities involve assessing the relationships between locations to provide context. The relationships between locations may be the answer in themselves (e.g., "Which places are similar to this location?"), or the relationships may be a preliminary step in a larger analytical process (e.g., "By identifying which places are similar to this location, we can use these locations in evaluating school district bus transportation policies."). Similarity is key for these types of questions; however, quantifying similarity can be difficult. Locations are more than just a number of attributes; the way people understand the relationships between locations is strongly tied to the literal or semantic meanings given to those locations.

世界についての意味付けはコンテキストに左右されることが多い。ロケーションの相対的な重要性は、主に、そのロケーションが他のロケーションとどのように比較されるかに基づく。コンテキスト評価は、何が「類似する」または「異なる」か、およびどの程度類似するまたは異なるかについてのメトリックに基づく。これらのメトリックは、視覚パターンの順序解釈に基づき得る。例えば、マップ上でどの地域が明るくてどの地域が暗いか、または多次元類似性スコアを表すインデックス化された値の定量的メトリックに基づく。 Meaning about the world is often context-dependent. The relative importance of a location is based primarily on how it compares to other locations. Contextual assessment is based on metrics about what is "similar" or "different," and how similar or different. These metrics can be based on ordinal interpretations of visual patterns, for example, which areas are light and which are dark on a map, or quantitative metrics of indexed values that represent multidimensional similarity scores.

図１Ａ－１は、地理空間エリアにおける人種分布を比較する単一の類似性尺度を使用するマップである。図１Ａ－２は、図１Ａ－１と同じロケーションにおける異なる人種の相対的な割合を表示するマップ１００を示す。ユーザがマップ上の異なる場所の類似性について知りたい場合、ユーザは、これらの４つのマップのそれぞれを頭の中で組み合わせ、どのマップが他のマップと最も似ているかを把握しなければならない。図１Ａ－１に示すように、これらの分布のそれぞれを単一の類似性尺度へと組み合わせることによって、それらがどのように互いに比較されるかを見ることが容易になる。 Figure 1A-1 is a map that uses a single similarity measure to compare racial distributions in geospatial areas. Figure 1A-2 shows a map 100 that displays the relative proportions of different races in the same location as Figure 1A-1. If a user wants to know the similarity of different places on the map, they must mentally combine each of these four maps and figure out which map is most similar to the others. By combining each of these distributions into a single similarity measure, as shown in Figure 1A-1, it becomes easy to see how they compare to each other.

図１Ｂは、いくつかの実装形態による、ジェンセンシャノン距離を使用する複数の属性を使用して類似性を示すマップ１０２の一例である。 Figure 1B is an example of a map 102 showing similarity using multiple attributes using Jensen-Shannon distance, according to some implementations.

属性（例えば、選択されたデータソースからのデータフィールド）におけるパターンを視覚化するように良好に設計されたマップを用いても、ロケーション間の類似性を評価することは困難な可能性がある。単一の属性（例えば、図１Ａ－２のように、黒人またはアフリカ系アメリカ人である人口の割合）の場合、ユーザは、類似性の指標として、マップ上の色合い（shade）の類似性を探すことができる。例えば、濃い緑色で示される国勢統計区はすべて、類似していると考えられ得る。しかしながら、読者による地域およびそれらの関係の理解は、多くの場合、数多くの属性に依存し、読者がパターンを視覚的に解釈し、次いで、それらを頭の中で集約して類似性を評価しなければならないとき、ロケーション間の類似性を正確に識別することは、事実上、不可能である。単一のマップ（例えば、二変量または三変量コロプレスマップ）上の少数の変数を視覚化するための方法は存在するが、より多数の変数の複雑性は、本明細書に説明される技法を必要とする。 Even with well-designed maps that visualize patterns in attributes (e.g., data fields from a selected data source), assessing similarities between locations can be difficult. For a single attribute (e.g., the percentage of the population that is Black or African American, as in Figure 1A-2), a user can look for similarities in shades on the map as an indicator of similarity. For example, all census tracts shown in dark green might be considered similar. However, a reader's understanding of regions and their relationships often depends on numerous attributes, and accurately identifying similarities between locations is virtually impossible when the reader must visually interpret patterns and then mentally aggregate them to assess similarity. While methods exist for visualizing a small number of variables on a single map (e.g., bivariate or trivariate choropleth maps), the complexity of larger numbers of variables requires the techniques described herein.

人々が、関心のある複数の変数にわたって類似性を識別し、容易に探索するのを助けるには、数多くの絡み合った課題がある。適切なデータを収集し、類似性を計算するという一般的な課題に加えて、意味をなし、かつ、人々が自身の意図に基づいて計算を調整することを可能にする方法で類似性をモデル化するというより広範な問題が存在する。概念としての空間的類似性は、非常に個人的であり、ユーザが具体的に測定することができるもの、ユーザがロケーションについて知覚するもの、および類似性を評価するために使用される個々の要素の重要性をユーザがどのように評価するかに影響される。同じ一般的入力（例えば、本明細書で説明される属性）を使用した計算の場合であっても、個人は、類似性について考えるとき、それらのうちのいくつかを他のものよりも重要となるようにまたは他のものよりも重要とならないように重み付けし得る。 Helping people identify and easily explore similarities across multiple variables of interest presents many intertwined challenges. In addition to the general challenge of collecting the right data and calculating similarity, there is the broader problem of modeling similarity in a way that makes sense and allows people to tailor the calculation based on their own intentions. Spatial similarity as a concept is highly personal and influenced by what a user can specifically measure, what they perceive about a location, and how they rate the importance of the individual factors used to assess similarity. Even for calculations using the same general inputs (e.g., the attributes described herein), individuals may weight some of them as more important than others or less important than others when thinking about similarity.

いくつかの実装形態は、空間的および属性類似性のためのモデルを使用する。これにより、推奨事項の改善、クエリ自体の可能な修正（例えば、図１Ａ－１および図１Ａ－２におけるロサンゼルスという地域対ロサンゼルスの正確な都市境界）、そして、ユーザが関連データに誘導され得る方法の拡張が可能になる。これは、類似したデータセットの推奨を通して、または関心のある特性に一致する必要性によって駆動される分析的な質問に対して行われ得る。例えば、企業は、特性を評価するために自社のトップドナーのサブセットを有する地域を選択し、次いで、アウトリーチ活動または広告活動でターゲットとる類似したロケーションを探し得る。いくつかの実装形態は、場所類似性行列（place-similarity matrix）を使用して、類似した場所特性（例えば、類似した社会経済的人口統計、または関心）を有する他の地域または潜在的候補を推奨する。これらの地域は、必ずしも、元のクエリロケーションの近くにある必要はない。 Some implementations use models for spatial and attribute similarity, which allows for improved recommendations, possible modifications of the query itself (e.g., the Los Angeles region in Figures 1A-1 and 1A-2 versus the exact city boundaries of Los Angeles), and expansion of how users can be guided to relevant data. This can be done through the recommendation of similar datasets or for analytical questions driven by the need to match characteristics of interest. For example, a company may select a region with a subset of its top donors to assess characteristics, then seek similar locations to target in outreach or advertising efforts. Some implementations use a place-similarity matrix to recommend other regions or potential candidates with similar place characteristics (e.g., similar socioeconomic demographics or interests). These regions do not necessarily need to be near the original query location.

いくつかの実装形態は、フロントエンドの対話型ウェブプラットフォームおよびバックエンドのデータストアという２つの構成要素から構成されるツールを提供する。ＰＨＰウェブハンドラのセットは、ウェブクライアントからの要求に基づいて２つの構成要素間でデータを渡す（例えば、ユーザがマウスで国勢統計区を選択したとき）。いくつかの実装形態では、フロントエンドは、Leafletフレームワークを使用して構築された対話型ウェブマップ、ならびにＤ３およびJQueryフレームワークを使用して構築された一連のＤＯＭコントロールである。いくつかの実装形態では、データは、空間的に有効化された（ＰｏｓｔＧＩＳ）PostGreSQLデータベースに記憶され、一意の国勢調査地理的識別子を使用して国勢統計区ジオメトリにリンクされる。いくつかの実装形態では、地理的境界（例えば、国勢統計区の境界）がGeoJSONとして記憶され、ページロード時にLeafletベースマップ上に階層化される。
データの例 Some implementations provide a tool consisting of two components: a front-end interactive web platform and a back-end data store. A set of PHP web handlers passes data between the two components based on requests from web clients (e.g., when a user selects a census tract with the mouse). In some implementations, the front-end is an interactive web map built using the Leaflet framework and a set of DOM controls built using the D3 and JQuery frameworks. In some implementations, data is stored in a spatially enabled (PostGIS) PostGreSQL database and linked to census tract geometries using unique census geographic identifiers. In some implementations, geographic boundaries (e.g., census tract boundaries) are stored as GeoJSON and layered on the Leaflet base map at page load.
Example data

本明細書で説明される技法は、任意のタイプの社会経済的および人口統計学的データを分析するために使用され得る。例示のために、このセクションでは、２０１９年のAmerican Community Survey（ＡＣＳ）の国勢統計区データに対する技法の適用について説明する。これには、米国カリフォルニア州についての国勢統計区レベルでの５年推定値（国勢統計区またはＡＣＳデータと呼ばれることもある）が含まれる。国勢統計区の粒度は、分析のための論理的解像度である。いくつかの実装形態は、データの視覚的分析のためのツールを提供する。このツールのスキャフォールディングは地理にとらわれないので、これらの地域は、必要に応じて、高解像度（例えば、国勢調査細分区グループ）または低解像度（例えば、郡）の地理に取り換えることができる。ＡＣＳデータには、年齢（Age）、人種（Race）、収入（Income）、教育達成度（Educational Attainment）、および通勤手段（Mode of Commuting）という５つの次元が含まれる。これらの次元の各々は、個々の社会経済的または人口統計学的属性のセットにわたる分布である。例えば、年齢次元では、カリフォルニア州の国勢統計区の各々について、０～１０歳、１０～１５歳、１５～２５歳などの人数の推定値がある。地域にわたって値を比較するために、いくつかの実装形態は、次元内のすべての属性を正規化し、各国勢統計区内の各次元について合計が１になる数値ベクトルを生成する。ＡＣＳデータは排他的かつ相補的であり、正規化が許容されることを意味する。 The techniques described herein can be used to analyze any type of socioeconomic and demographic data. For illustrative purposes, this section describes the application of the techniques to census tract data from the 2019 American Community Survey (ACS). This includes census tract-level five-year estimates for the state of California (sometimes referred to as census tract or ACS data). The census tract granularity is the logical resolution for analysis. Some implementations provide tools for visual analysis of the data. The scaffolding in this tool is geography agnostic, so these regions can be replaced with higher-resolution (e.g., census block groups) or lower-resolution (e.g., counties) geographies as needed. ACS data includes five dimensions: age, race, income, educational attainment, and mode of commuting. Each of these dimensions is a distribution across a set of individual socioeconomic or demographic attributes. For example, in the age dimension, for each census tract in California, there are estimates of the number of people aged 0-10, 10-15, 15-25, etc. To compare values across regions, some implementations normalize all attributes in the dimension, producing a numeric vector that sums to 1 for each dimension within each census tract. ACS data is exclusive and complementary, meaning that normalization is allowed.

いくつかの実装形態は、データ内の国勢統計区ジオメトリのすべてのペア間のユークリッド距離を計算する。これにより、ユーザは、類似した地域を識別する際の近接性の影響を制御することができる。国勢統計区ごとに人口密度が計算され、国勢統計区が海岸から２０マイル以内にあるか否かを示すブール値も計算される。より精密なフィルタリングを可能にするために、外部データをツール内に組み込むことができる。
類似性を定義するための方法の例 Some implementations calculate the Euclidean distance between every pair of census tract geometries in the data, allowing users to control the influence of proximity in identifying similar areas. Population density is calculated for each census tract, as well as a Boolean value indicating whether the census tract is within 20 miles of the coast. External data can be incorporated into the tool to allow for more precise filtering.
Examples of methods for defining similarity

それぞれがビニングされた社会経済的または人口統計学的な値の正規化されたベクトルを含む５つの次元に分割されたＡＣＳデータのセットが与えられると、いくつかの実装形態は、各次元について別々にすべての国勢統計区間のペアワイズ類似性を計算する。これを達成するために、いくつかの実装形態は、ジェンセン－シャノン距離（ＪＳＤ）を計算する。ＪＳＤは、２つの確率分布間の非類似性を測定する方法である。これは、カルバックライブラー情報量（ＫＬＤ：Kullback-Leibler divergence）（式２）に基づいて、２つの分布に対して相対エントロピーアプローチを使用するが、対称であり、結果として得られる尺度が有限であるという点でＫＬＤとは異なる。ＪＳＤは、美的ランキングの予測から健康コンテンツの紹介方法の区別まで、広範囲のアプリケーションについて類似性を評価する際に成功裏に使用されてきた。地理的な分野では、ＪＳＤは、名所の区別および土地利用パターンの評価などのタスクに使用されてきた。 Given a set of ACS data divided into five dimensions, each containing a normalized vector of binned socioeconomic or demographic values, some implementations calculate the pairwise similarity of all census intervals separately for each dimension. To accomplish this, some implementations calculate the Jensen-Shannon distance (JSD). JSD is a method for measuring dissimilarity between two probability distributions. It uses a relative entropy approach for the two distributions, based on the Kullback-Leibler divergence (KLD) (Equation 2), but differs from KLD in that it is symmetric and the resulting measure is finite. JSD has been used successfully to assess similarity for a wide range of applications, from predicting aesthetic rankings to differentiating health content presentation methods. In the geographic domain, JSD has been used for tasks such as distinguishing landmarks and assessing land-use patterns.

ＪＳＤ式を以下の式１に示す。ここで、ＣＴ_AおよびＣＴ_Bは、２つの異なる国勢統計区についての同じ国勢調査次元（例えば、人種分布）の正規化されたベクトルであり、
であり、ｘは次元ベクトルＸにおける単一の属性値である。
The JSD formula is shown below in Equation 1: where CT _A and CT _B are normalized vectors of the same census dimension (e.g., racial distribution) for two different census tracts;
where x is a single attribute value in the dimension vector X.

この分析の結果は、ＡＣＳデータの５つの分布に基づいて２つの国勢統計区間の類似性を定量化する特異値のセットである。このプロセスは、国勢統計区のすべてのペアついて繰り返され、ＡＣＳ次元の各々に対して１つずつ、合計５つの類似性行列を生成する。ＪＳＤ値は、０（同一）と１（完全な非類似性）との間に制限されるが、ＪＳＤ値の実際の範囲は、基礎となる入力分布に依存する。これらの範囲は、次元によってかなり異なり、０．５の最大ＪＳＤを報告するものもあれば、０．９を報告するものもある。最終目標は、地域間の類似性を視覚的に表すための単一の集約値を決定することであるので、いくつかの実装形態は、単一の国勢統計区を表すために個々の次元のＪＳＤ値を組み合わせる。いくつかの実装形態は、単に値を平均する。しかしながら、ＪＳＤ範囲の違いは、均等に重み付けされたアプローチであっても、特定の次元を他の次元よりも重み付けすることを意味する。この問題を緩和するために、いくつかの実装形態は、最初に、すべての他の地理と比較して、各国勢統計区のＪＳＤ値を正規化する。これにより、すべての地理におけるすべての次元について０～１の範囲が生成される。最後に、いくつかの実装形態は、１から各ＪＳＤ値を減算することによって、非類似性値を類似性に変換する。正規化されたＪＳＤ値のこれらの５つの行列は、データ視覚化を更新するために使用される。 The result of this analysis is a set of singular values that quantify the similarity between two census tracts based on the five distributions of ACS data. This process is repeated for all pairs of census tracts, producing five similarity matrices, one for each ACS dimension. JSD values are bounded between 0 (identity) and 1 (complete dissimilarity), but the actual range of JSD values depends on the underlying input distribution. These ranges vary considerably across dimensions, with some reporting a maximum JSD of 0.5 and others reporting 0.9. Because the ultimate goal is to determine a single aggregate value to visually represent the similarity between regions, some implementations combine the JSD values of individual dimensions to represent a single census tract. Some implementations simply average the values. However, differences in JSD ranges mean that even an evenly weighted approach weights certain dimensions more than others. To mitigate this issue, some implementations first normalize the JSD value for each census tract relative to all other geographies. This produces a range of 0 to 1 for all dimensions in all geographies. Finally, some implementations convert the dissimilarity values to similarities by subtracting each JSD value from 1. These five matrices of normalized JSD values are used to update the data visualization.

いくつかの実装形態では、次のステップは、これらの独立したＡＣＳ次元の各々についてのＪＳＤ値を、国勢統計区の各ペアについての単一の類似性値にマージすることを含む。この単一の値は、類似性がユーザによって表形式で評価され、視覚化のために色密度に変換される基礎である。図３０は、いくつかの実装形態による、類似性値を計算するための例示的なプロセス３０００の概略図を示す。このプロセスは、ＡＣＳ分布から開始し、国勢統計区ＡおよびＢという２つのサンプルを使用して単一の類似性値を計算する。いくつかの実装形態は、２つの地域（例えば、図３０の国勢統計区ＡおよびＢ）間の類似性を計算する。各国勢統計区は、複数の次元のデータ３００２（例えば、年齢、人種、収入、教育、および通勤手段などの５つの次元のデータ）を含む。データの各次元は、国勢調査値の分布である。いくつかの実装形態は、各国勢統計区の次元の間のＪＳＤ３００４を個々に計算し、次いで、各ＪＳＤに対応するユーザ重み３００６（ユーザ定義の重みと呼ばれることもある）を乗算し、最後に、それらの積を合計して（３００８）、国勢統計区のペアについての単一の類似性値３０１０を生成する。 In some implementations, the next step involves merging the JSD values for each of these independent ACS dimensions into a single similarity value for each pair of census tracts. This single value is the basis on which similarity is assessed by the user in tabular form and converted to color density for visualization. Figure 30 shows a schematic diagram of an example process 3000 for calculating similarity values according to some implementations. The process starts with an ACS distribution and calculates a single similarity value using two samples, census tracts A and B. Some implementations calculate the similarity between two regions (e.g., census tracts A and B in Figure 30). Each census tract contains multiple dimensions of data 3002 (e.g., five dimensions of data, such as age, race, income, education, and commute). Each dimension of data is a distribution of census values. Some implementations calculate the JSD 3004 between each census tract dimension individually, then multiply each JSD by a corresponding user weight 3006 (sometimes called a user-defined weight), and finally sum 3008 the products to generate a single similarity value 3010 for the census tract pair.

次元固有のＪＳＤ値を平均するのではなく、いくつかの実装形態は、代わりに、ユーザが、各次元が国勢統計区の全体的な類似性に与える影響を決定することを可能にする。いくつかの実装形態は、グラフィカルユーザインターフェース内のスライダとして、一連のユーザ定義の重みを提供する。次元の各々に重みが割り当てられ、すべての重みの合計は１になる。これらの重みの公開は、ユーザの分析要件を最良に満たすようにモデルを調整するようにユーザを促す。ユーザは、異なる選好、目的、および探索目標を有し、個人またはグループが類似性評価プロセスを統制する機会は、ユーザに権限を与え、ツールの有用性を向上させる。
ユーザインターフェースの例 Rather than averaging dimension-specific JSD values, some implementations instead allow users to determine the impact each dimension has on the overall similarity of census tracts. Some implementations provide a set of user-defined weights as sliders in a graphical user interface. A weight is assigned to each dimension, and all weights sum to 1. Exposing these weights prompts users to tune the model to best meet the user's analytical requirements. Users have different preferences, objectives, and search goals, and the opportunity for individuals or groups to control the similarity assessment process empowers users and improves the usability of the tool.
User Interface Example

図３１は、いくつかの実装形態による、例示的なユーザインターフェース３１００を示す。例示的なユーザインターフェースは、マップデータ視覚化３１０２を含む。この例は、（Ａ）とラベル付けされた、選択されたロケーションと比較した米国国勢統計区の類似性を示す。類似性に関する詳細は、ツールチップ（Ｂ）に含まれる。ユーザインターフェースは、ユーザが、関心のある特性を重み付けして（Ｃ）、ユーザの質問および関心にしたがって類似性の計算を調整することを可能にし、視覚化のための複数のマップタイプ（Ｄ）と、ユーザが特定のミキサ設定およびフィルタを用いてプリセットファイルを保存および再使用することを可能にする機能（Ｅ）とを提供する。対話型テキスト記述（Ｆ）およびソート可能な要約表（Ｇ）も提供される。 Figure 31 shows an example user interface 3100 according to some implementations. The example user interface includes a map data visualization 3102. This example shows the similarity of U.S. census tracts compared to a selected location, labeled (A). Details about the similarity are included in a tooltip (B). The user interface allows the user to weight characteristics of interest (C) to tailor the similarity calculation according to the user's question and interests, and provides multiple map types for visualization (D), as well as the ability to allow the user to save and reuse preset files with specific mixer settings and filters (E). An interactive text description (F) and a sortable summary table (G) are also provided.

いくつかの実装形態では、データ視覚化アプリケーションを起動すると、ユーザに、マップ（例えば、カリフォルニア州の国勢統計区を示すマップ）が均一な色（例えば、均一な灰色）で提示される。地区、町、および都市を（ズームレベルに応じて）示すマップラベルが、参照レイヤとして国勢統計区の上にオーバーレイされる。画面の左側の縦型パネルは、ユーザが、マップをクリックすることで国勢統計区を選択することを可能にする。国勢統計区が選択されると、その国勢統計区の識別子がウェブハンドラを介してデータベースに送信され、ＪＳＯＮ応答、すなわち事前に計算された５つの次元のＪＳＤ類似性値を返す。次いで、各国勢統計区の５つのＪＳＤ値のそれぞれに、正規化されたユーザ定義の重み（ページロード時に均等に重み付けされる）を乗算し、合計して、国勢統計区ごとに単一の類似性値を生成する。
マップパネルの例 In some implementations, upon launching the data visualization application, the user is presented with a map (e.g., a map showing census tracts in California) in a uniform color (e.g., a uniform gray). Map labels showing neighborhoods, towns, and cities (depending on the zoom level) are overlaid on top of the census tracts as a reference layer. A vertical panel on the left side of the screen allows the user to select a census tract by clicking on the map. Once a census tract is selected, the census tract's identifier is sent via a web handler to a database, which returns a JSON response, i.e., pre-computed five-dimensional JSD similarity values. Each of the five JSD values for each census tract is then multiplied by a normalized user-defined weight (which is weighted equally at page load) and summed to generate a single similarity value for each census tract.
Map panel example

いくつかの実装形態では、次いで、類似性値は、等間隔コロプレス色方式（equal interval choropleth color scheme）を使用してマップ上に変換され、マップ上の国勢統計区レイヤに適用される。いくつかの実装形態では、より濃い色または色合い（例えば、より濃い青色値）は、より高い類似性を示す。いくつかの実装形態では、ツールチップ機能性により、ユーザは、マップの各国勢統計区上にカーソルを乗せ、国勢統計区識別子、郡の名前、類似性ランク、および選択された国勢統計区との類似性一致率を含む情報を受け取ることができる。いくつかの実装形態では、設定メニュー内で、ユーザは、追加のツールチップ詳細を有効にするオプションを有し、これは、個々の次元の各々に対する類似性値をツールチップに追加する。 In some implementations, the similarity values are then translated onto a map using an equal interval choropleth color scheme and applied to a census tract layer on the map. In some implementations, darker colors or shades (e.g., darker blue values) indicate higher similarity. In some implementations, tooltip functionality allows the user to hover over each census tract on the map and receive information including the census tract identifier, county name, similarity rank, and similarity match percentage with the selected census tract. In some implementations, within the settings menu, the user has the option to enable additional tooltip details, which add similarity values for each individual dimension to the tooltip.

いくつかの実装形態は、ユーザが、関心のある国勢統計区を選択することを可能にするマップパネルを提供する。いくつかの実装形態は、（１つまたは複数の）選択された地域とデータセット内のすべての他の地域との間の類似性を地図形式および／または表形式で提示することによって応答する。いくつかの実装形態は、ズーミングおよびパニングを通じてマップと対話し、マップを探索する能力も提供する。 Some implementations provide a map panel that allows the user to select a census tract of interest. Some implementations respond by presenting, in map and/or tabular form, the similarity between the selected tract(s) and all other tracts in the dataset. Some implementations also provide the ability to interact with and explore the map through zooming and panning.

図２０Ａは、いくつかの実装形態による、多角形描画機能性を提供する例示的なユーザインターフェース２０００を示す。いくつかの実装形態では、多角形描画ツールにより、ユーザは、分析のための国勢統計区のセットをサブ選択することができる。ユーザは、図２０Ｂに示されるように、類似性評価を関心のある指定されたサブセットに限定するために、マップ上に地域を手動で描画することを可能にする多角形描画ツールを選択することができる。いくつかの実装形態では、マップパネルから、ユーザはまた、マップを印刷し、ベースマップを標準マップタイルから衛星画像に変更し、マップラベル、郡境界レイヤ、および主要国勢統計区レイヤを切り替えることができる。
表形式パネルの例 Figure 20A shows an example user interface 2000 that provides polygon drawing functionality, according to some implementations. In some implementations, the polygon drawing tool allows a user to sub-select a set of census tracts for analysis. A user can select the polygon drawing tool, which allows them to manually draw an area on the map to limit the similarity assessment to a specified subset of interest, as shown in Figure 20B. In some implementations, from the map panel, a user can also print the map, change the base map from standard map tiles to satellite imagery, and switch map labels, county boundary layers, and key census tract layers.
Tabular panel example

いくつかの実装形態では、マップが地域類似性を示すように更新されると、マップの下に別のパネルが提示される。追加のパネルは、類似性分析に関する記述的内容を提供する。いくつかの実装形態では、このパネル内の記述テキストは、非常に類似する国勢統計区の数、それらが見つかった郡の数、および／または選択と同一の郡における類似した国勢統計区の数を提示する。テキストには、ハイパーリンクが埋め込まれており、ユーザは、選択された国勢統計区の１つまたは複数の上位の郡にズームすることができる。加えて、上位５つの最も類似している国勢統計区、それらの類似性一致率、郡の名前、ならびに選択された国勢統計区からの距離および方向を列挙する表が示される。いくつかの実装形態では、ユーザは、表内の行をクリックして、マップ上の国勢統計区をハイライト表示するか、または拡大鏡アイコンを選択することで、選択された国勢統計区にズームすることができる。この表内で、類似性についての列のヘッダをクリックすると、降順と昇順とが切り替わり、これにより、ユーザは、最も類似している国勢統計区と最も類似しない国勢統計区とを容易に識別することができる。
サイドパネルの例 In some implementations, once the map is updated to show regional similarities, another panel is presented below the map. The additional panel provides descriptive content about the similarity analysis. In some implementations, descriptive text in this panel presents the number of highly similar census tracts, the number of counties in which they are found, and/or the number of similar census tracts in the same county as the selection. The text includes embedded hyperlinks that allow the user to zoom to one or more of the counties above the selected census tract. In addition, a table is presented listing the top five most similar census tracts, their similarity match percentage, the name of the county, and the distance and direction from the selected census tract. In some implementations, the user can click a row in the table to highlight the census tract on the map or select the magnifying glass icon to zoom to the selected census tract. Clicking the column header for similarity in this table toggles between descending and ascending order, allowing the user to easily identify the most similar and least similar census tracts.
Side Panel Example

いくつかの実装形態では、国勢統計区が選択されると、サイドパネルも出現し、データ探索および分析を可能にするための様々な対話型ツールを提供する。いくつかの実装形態では、サイドパネルには、ミキサ、マップタイプ、プリセット、ロケーションブックマーク、および／または地理的フィルタを含む一連のウィジェットが含まれる。
ミキサウィジェットの例 In some implementations, once a census tract is selected, a side panel also appears, providing various interactive tools to enable data exploration and analysis. In some implementations, the side panel includes a series of widgets, including a mixer, map types, presets, location bookmarks, and/or geographic filters.
Mixer widget example

いくつかの実装形態では、ミキサは対話型機能性を提供し、これを通じて、ユーザは、類似性値への全体的な寄与における社会経済的または人口統計学的次元の重要性（重み）を調整することができる。これらの重みは、ユーザが、スライダを右に移動させることによって重みを増加させ、スライダを左に移動させることによって重みを減少させることを可能にするスライダによって表される。いくつかの実装形態では、デフォルトで、ミキサは、５０の重要度値で均等に重み付けされる。ユーザがミキサを調整すると、次元ラベルの色密度が変化し、重みの数値表現が変化し（０と１００との間に制限される）、ミキサに関連付けられたツールチップが更新されて、この調整が全体的な類似性モデルに与えている影響をユーザに知らせる。 In some implementations, the mixer provides interactive functionality through which the user can adjust the importance (weight) of socioeconomic or demographic dimensions in their overall contribution to the similarity value. These weights are represented by sliders that allow the user to increase the weight by moving the slider to the right and decrease the weight by moving the slider to the left. In some implementations, by default, the mixers are equally weighted with an importance value of 50. As the user adjusts the mixer, the color intensity of the dimension labels changes, the numerical representation of the weights changes (constrained between 0 and 100), and the tooltip associated with the mixer is updated to inform the user of the impact their adjustments are having on the overall similarity model.

いくつかの実装形態では、ウィジェット中のミキサのうちの少なくともいくつかは、社会経済的または人口統計学的次元ではない。例えば、このウィジェット内の最初の５つのミキサは、国勢調査データの社会経済的および人口統計学的次元であり得るが、最後のミキサ（例えば、近接性ミキサ）はそうではない。この近接性ミキサは、ミックス内の２つの国勢統計区間のユークリッド距離の重みを調整する。ミックスにおける近接性の重みを増加させることによって、選択された国勢統計区に物理的により近い国勢統計区は、より離れた国勢統計区よりも類似していると見なされる。近接性ミキサの値を０に調整すると、地理的な近接性の影響が完全に排除される。 In some implementations, at least some of the mixers in the widget are not socioeconomic or demographic dimensions. For example, the first five mixers in this widget may be socioeconomic and demographic dimensions of census data, but the last mixer (e.g., the proximity mixer) is not. This proximity mixer adjusts the Euclidean distance weight of two census tracts in the mix. By increasing the proximity weight in the mix, census tracts that are physically closer to the selected census tract are considered more similar than census tracts that are further away. Adjusting the proximity mixer value to 0 completely eliminates the influence of geographic proximity.

ユーザが分析に有用な重みの組合せを識別すると、ユーザは、そのミックスを新しいプリセットに保存するオプションを有する。これにより、プリセットウィジェットが更新され、ダウンロードのためのプリセットＸＭＬファイルが生成され、コラボレータ（collaborator）と共有するための一意のユニフォームリソースロケータ（ＵＲＬ）が提示される。
マップタイプウィジェットの例 Once the user identifies a combination of weights that is useful for their analysis, they have the option to save that mix to a new preset, which updates the preset widget and generates a preset XML file for download, providing a unique uniform resource locator (URL) for sharing with collaborators.
Map-type widget example

図３２Ａおよび図３２Ｂは、いくつかの実装形態による、同じ基礎となる類似性データについて異なる地図形式視覚化３２００および３２０２をそれぞれ示す。デフォルトで、いくつかの実装形態は、勾配ベースのコロプレスマップを使用して国勢統計区類似性を提示する（図３２Ａ）。この表現は、多くの状況において有用であるが、他の状況では、最も適切な地図形式視覚化ではない場合がある。この理由から、いくつかの実装形態は、Most／Least（図３２Ｂ）などの代替マップタイプオプションを提供する。このオプションは、マップを簡略化し、各地域の類似性を次の３つのオプションのうちの１つとして提示する：非常に類似するものは第１の色（例えば、青）、非常に類似しないものは第２の色（例えば、赤）、またはそれらの中間を第３の色（例えば、灰色）。いくつかの実装形態では、ユーザはさらに、選択された国勢統計区と類似する／類似しない上位１０００、１００、または１０個の国勢統計区を選択することによって、「非常に」が何を意味するかを絞り込むことができる。この地図形式アプローチは、勾配の代わりにブール型（類似または非類似）の視覚化を好むユーザにとって特に有用である。
プリセットウィジェットの例 Figures 32A and 32B show different map-style visualizations 3200 and 3202, respectively, of the same underlying similarity data, according to some implementations. By default, some implementations present census tract similarity using a gradient-based choropleth map (Figure 32A). While this representation is useful in many situations, it may not be the most appropriate map visualization in others. For this reason, some implementations offer alternative map-type options, such as Most/Least (Figure 32B). This option simplifies the map and presents each tract's similarity as one of three options: very similar in a first color (e.g., blue), very dissimilar in a second color (e.g., red), or somewhere in between in a third color (e.g., gray). In some implementations, the user can further refine what "very" means by selecting the top 1,000, 100, or 10 census tracts that are similar/dissimilar to the selected census tract. This cartographic approach is particularly useful for users who prefer a Boolean (similar or dissimilar) visualization instead of a gradient.
Preset widget examples

いくつかの実装形態では、ユーザがミキサウィジェット内のスライダを調整することによってミックスを作成すると、ユーザは、そのミックスをプリセットとしてラベル付けして保存することを選択することができ、それは、このウィジェット内のボタンとして現れる。様々なシナリオを表し、異なるタイプの分析を可能にするために、複数のプリセットを作成することができる。これらのプリセットは、プリセットＸＭＬファイルにも保存され、コラボレータとプリセットを共有するためにＵＲＬに付加することもできる一意の識別子と共にサーバに記憶される。ユーザが前のセッションでプリセットＸＭＬファイルを介してプリセットを作成または共有した場合、ユーザは、設定メニューを介してこのファイルをアップロードし、ミキサを調整するボタンをウィジェットに自動的に追加することもできる。ロケーションブックマークウィジェットの例 In some implementations, once a user creates a mix by adjusting sliders in the mixer widget, the user can choose to label and save the mix as a preset, which appears as a button in the widget. Multiple presets can be created to represent various scenarios and enable different types of analysis. These presets are also saved in a preset XML file and stored on the server with a unique identifier that can also be appended to a URL to share the preset with collaborators. If a user created or shared a preset via a preset XML file in a previous session, the user can also upload this file via the settings menu to automatically add a button to adjust the mixer to the widget. Example of a Location Bookmark Widget

いくつかの実装形態では、ロケーションブックマークウィジェットは、関心のあるロケーションをボタンとして記憶する。ボタンをクリックすると、指定された地域にマップがズームされる。いくつかの実装形態では、３つのロケーションブックマークがデフォルトでツールに追加されているが、ラベル、地理的座標、およびズームレベルを含むプリセットＸＭＬファイルをアップロードすることによって、追加のブックマークを追加することができる。
地理的フィルタウィジェットの例 In some implementations, the location bookmark widget stores locations of interest as buttons that, when clicked, zoom the map to the specified area. In some implementations, three location bookmarks are added to the tool by default, but additional bookmarks can be added by uploading a preset XML file that contains labels, geographic coordinates, and zoom levels.
Geographic filter widget example

図３３は、いくつかの実装形態による、例示的なユーザインターフェース３３００を示す。いくつかの実装形態では、地理的フィルタウィジェットにより、ユーザは、追加の属性を通じて既存の国勢統計区をフィルタリングすることができる。例えば、ユーザは、（図３３に示されるように）人口密度が１平方マイル当たり１００人未満の国勢統計区のみ、または海岸から２０キロメートル未満の国勢統計区のみを探索することを決定することができる。デフォルトで、いくつかの実装形態は、言及されたものなどの追加の変数のサンプルセットを含む。いくつかの実装形態では、ユーザは、これらのデータに対して独自のＳＱＬタイプのクエリを書き込み、それらをプリセットＸＭＬファイルに追加し、それらをツールにアップロードすることができる。ファイルがロードされると、クエリで指定された基準を満たす国勢統計区のみがマップ上に表示される。これは、強力な機能であり、ユーザは、地域探索および類似性分析に入り込む前に、外部属性フィルタリングを介してデータを限定することができる。 Figure 33 shows an example user interface 3300, according to some implementations. In some implementations, a geographic filter widget allows users to filter existing census tracts through additional attributes. For example, a user may decide to search only for census tracts with a population density of less than 100 people per square mile (as shown in Figure 33), or only for census tracts less than 20 kilometers from the coast. By default, some implementations include a sample set of additional variables, such as those mentioned. In some implementations, users can write their own SQL-type queries against this data, add them to a preset XML file, and upload them to the tool. Once the file is loaded, only census tracts that meet the criteria specified in the query are displayed on the map. This is a powerful feature, allowing users to limit data through external attribute filtering before diving into area exploration and similarity analysis.

上述したように、いくつかの実装形態は、社会経済的および人口統計学的変数を類似した特性の次元にグループ化し、類似性メトリックを計算し、および／またはユーザがこれらの次元を重み付して類似性がどのように計算されるかをカスタマイズすることを可能にする。これにより、ユーザは、類似性を定義する際に多くの柔軟性を得ることができる。いくつかの実装形態は、ユーザが１つの次元を使用することを可能にし、これは、ユーザが１つの次元（例えば、年齢分布）に基づく類似性を知りたいだけであるときに有用である。いくつかの実装形態は、ユーザが次元の組合せを使用することを可能にし、これは、ユーザが次元を異なるように重み付けすることを可能にしつつ（例えば、ユーザは、類似性を計算する際に収入をより重要にすることを望む場合、それに応じて重みを調整することができる）、複数の次元（例えば、年齢と収入）について知るのに有用であり得る。 As mentioned above, some implementations group socioeconomic and demographic variables into dimensions of similar characteristics, calculate a similarity metric, and/or allow the user to weight these dimensions to customize how similarity is calculated. This allows the user a lot of flexibility in defining similarity. Some implementations allow the user to use a single dimension, which is useful when the user only wants to know similarity based on one dimension (e.g., age distribution). Some implementations allow the user to use a combination of dimensions, which can be useful for knowing about multiple dimensions (e.g., age and income) while allowing the user to weight the dimensions differently (e.g., if the user wants income to be more important in calculating similarity, they can adjust the weight accordingly).

いくつかの実装形態は、２つの確率分布間の（非）類似性を測定する統計方法であるジェンセンシャノン情報量（ＪＳＤ）を使用する。ＪＳＤ自体は、推論の統計モデルにおいて情報利得を測定するために使用されることが多い方法であるカルバックライブラー情報量に基づく。ＪＳＤは、対称であるアプリケーションにより適している。これは、任意の２つの国勢統計区を比較する値が両方向において同じであるとき類似性を理解することが概念的により容易であるので、アプリケーションが対称であるときに特に重要である：国勢統計区Ａ－＞国勢統計区Ｂの類似性は、国勢統計区Ｂ－＞国勢統計区Ａに等しい。各国勢統計区の値が合計して１になるように、分布内の変数がすべて、属性のすべてにわたって正規化されると仮定すると、ＪＳＤ類似性値は、常に０（同一分布）と１（完全な非類似性）との間に制限される。 Some implementations use Jensen-Shannon Divergence (JSD), a statistical method for measuring the (dis)similarity between two probability distributions. JSD itself is based on Kullback-Leibler Divergence, a method often used to measure information gain in statistical models of inference. JSD is more suitable for applications that are symmetric. This is particularly important when the application is symmetric, as it is conceptually easier to understand similarity when the values comparing any two census tracts are the same in both directions: the similarity of census tract A -> census tract B is equal to census tract B -> census tract A. Assuming that all variables in the distributions are normalized across all of the attributes so that the values for each tract sum to 1, JSD similarity values are always bounded between 0 (identical distribution) and 1 (perfect dissimilarity).

説明のために、図１Ｃに示される例を考える。図１Ｃの画像１０４は、ロサンゼルスの地域内の国勢統計区を示す。この例では、３つの国勢統計区が選択される。人種変数の分布は、いくつかの実装形態にしたがって、図１Ｄに示されるチャート１０６に示される。視覚的におよび空間近接性から予想され得るように、国勢統計区１および国勢統計区２は、国勢統計区１および国勢統計区３よりも類似している（チャート内の分布が、より似ているように見える）。ＪＳＤアプローチの結果、国勢統計区１と国勢統計区２との間の類似性値は０．１３５９となり、より類似していることを示しており、一方で、国勢統計区１と国勢統計区３との間の類似性値は０．７１９７であり、あまり類似していないことを示している。ＪＳＤは非類似性尺度であるので、０の尺度は同一であり、１は完全に異なる。 To illustrate, consider the example shown in FIG. 1C. Image 104 in FIG. 1C shows census tracts within the Los Angeles region. In this example, three census tracts are selected. The distribution of the race variable is shown in chart 106, shown in FIG. 1D, according to some implementations. As might be expected visually and from spatial proximity, census tract 1 and census tract 2 are more similar (their distributions in the chart appear more similar) than census tract 1 and census tract 3. The JSD approach results in a similarity value of 0.1359 between census tract 1 and census tract 2, indicating more similarity, while the similarity value of 0.7197 between census tract 1 and census tract 3, indicating less similarity. Since JSD is a dissimilarity measure, measures of 0 are identical and 1 are completely different.

このアプローチを使用して、いくつかの実装形態は次いで、国勢統計区で以下に示されるような、ロケーションのすべての可能なペア間の任意の特性についてのＪＳＤを計算する。いくつかの実装形態は、特徴的な色または輝度レベルを使用して、ユーザによって選択された国勢統計区をハイライト表示する（例えば、より明るい色）。例えば、いくつかの実装形態によれば、図１Ｅに示されるマップ１０８内のより明るい国勢統計区は、選択された国勢統計区であり、他の各国勢統計区は、この選択されたロケーションに対する相対的な類似性を示す。 Using this approach, some implementations then calculate the JSD for any property between all possible pairs of locations, such as those shown below with census tracts. Some implementations use a distinctive color or brightness level to highlight the census tract selected by the user (e.g., a brighter color). For example, according to some implementations, the brighter census tract in the map 108 shown in FIG. 1E is the selected census tract, and each of the other census tracts indicates a relative similarity to the selected location.

いくつかの実装形態は、属性の複数のカテゴリにわたるＪＳＤを示し、ユーザが類似性を評価する際にそれらの重要度を重み付けすることを可能にする。マップ１０８において、ツールチップは、人種、年齢、収入、教育達成度、および通勤手段という次元についての類似性尺度を示す。いくつかの実装形態は、ツールチップにおいて類似性値を個々に示す。いくつかの実装形態は、ユーザが、類似性尺度を組み合わせて、異なる属性の重要性を関心のある特定の質問に調整することを可能にする。類似性を評価するための次元が複数あるので、これは、新たな課題をもたらす。いくつかの実装形態は、ユーザ定義の重みのセットを使用することによってこれを処理する。タスク、専門知識、および関心のあるトピックに応じて、異なるユーザは、類似性モデル内で異なるように次元を重み付けすることを望む場合がある。これを行うために、いくつかの実装形態は、以下のように類似性を計算する（ただし、すべての重み（ｗ）を合計すると１になることに注意する）：
Ｓｉｍ＝ＪＳＤ_Race＊ｗ₁＋ＪＳＤ_Age＊ｗ₂＋ＪＳＤ_Income＊ｗ₃＋ＪＳＤ_Education＊ｗ₄＋ＪＳＤ_...＊ｗ_N… Some implementations show JSD across multiple categories of attributes and allow users to weight their importance in assessing similarity. In map 108, tooltips show similarity measures for the dimensions race, age, income, educational attainment, and commute. Some implementations show similarity values individually in the tooltips. Some implementations allow users to combine similarity measures to tailor the importance of different attributes to the particular question of interest. With multiple dimensions for assessing similarity, this introduces new challenges. Some implementations handle this by using a set of user-defined weights. Depending on the task, expertise, and topic of interest, different users may want to weight dimensions differently in the similarity model. To do this, some implementations calculate similarity as follows (note that all weights (w) sum to 1):
Sim=JSD _Race *w ₁ +JSD _Age *w ₂ +JSD _Income *w ₃ +JSD _Education *w ₄ +JSD _... *w _N ...

例えば、図１Ｆおよび図１Ｇは、いくつかの実装形態による、次元に対して異なる重みを使用して類似性を示す２つの異なるマップ１１０および１１２をそれぞれ示す。マップ１１０は、等しく重み付けされた人種、収入、および教育を調査している。マップ１１２は、類似性計算で使用される唯一の次元として人種を示す。 For example, Figures 1F and 1G show two different maps 110 and 112, respectively, that show similarity using different weights for dimensions, according to some implementations. Map 110 examines race, income, and education equally weighted. Map 112 shows race as the only dimension used in the similarity calculation.

図２は、いくつかの実装形態による、ユーザがミキサ重みを調整し、プリセットファイルをアップロードし、関心のある地域を制限し、および／またはマップタイプを変更することができる様々な方法を示すヘルプメニュー２０２を有するグラフィカルユーザインターフェース２００を示す。 FIG. 2 shows a graphical user interface 200 with a help menu 202 illustrating various ways a user can adjust mixer weights, upload preset files, restrict regions of interest, and/or change map types, according to some implementations.

図３Ａおよび図３Ｂに示されるように、いくつかの実装形態は、グラフィカルユーザインターフェース（例えば、インターフェース３００および３０２）にマップを示し、ユーザが（例えば、特定の国勢統計区を選択するために）マップ上のロケーションおよび／または地域を選択することを可能にする。本明細書の説明では、国勢統計区（census tract）（国勢統計区（Census tract）と呼ばれることもある）のデータ視覚化が、例示のために使用される。これらの概念は、任意のタイプの統計的データ、地理的情報、および／または人口統計学的情報に適用することができることを理解されたい。 As shown in FIGS. 3A and 3B, some implementations present a map in a graphical user interface (e.g., interfaces 300 and 302) and allow a user to select a location and/or area on the map (e.g., to select a particular census tract). In the description herein, a data visualization of census tracts (sometimes referred to as census tracts) is used for illustrative purposes. It should be understood that these concepts can be applied to any type of statistical data, geographic information, and/or demographic information.

いくつかの実装形態では、ユーザインターフェースは、選択された国勢統計区に類似するすべての他の国勢統計区を示すように自動的に更新され、その例を、図４のグラフィカルユーザインターフェース４００に示す。いくつかの実装形態は、ユーザによって選択された国勢統計区に最も類似していない国勢統計区から最も類似している国勢統計区を示す凡例４０２を表示する。類似性は、様々な人口統計学的および経済的次元に基づいて測定され得る。 In some implementations, the user interface automatically updates to show all other census tracts similar to the selected census tract, an example of which is shown in graphical user interface 400 of FIG. 4. Some implementations display a legend 402 showing the census tracts that are most similar to the census tract selected by the user, from least similar. Similarity can be measured based on various demographic and economic dimensions.

いくつかの実装形態は、これらの次元を、対応するスライダアフォーダンスと共にグラフィカルユーザインターフェースの一部に示す。いくつかの実装形態による、そのようなアフォーダンスを含む部分５０２を有する例示的なグラフィカルユーザインターフェース５００が図５に示される。いくつかの実施形態は、ユーザが各アフォーダンス上にカーソルを乗せて各カテゴリに関するより多くの情報を得ることを可能にする。いくつかの実装形態によれば、各スライダアフォーダンスは、個々に調整することができる。いくつかの実装形態は、いくつの国勢統計区が選択された国勢統計区に類似しているかを含む類似性の詳細を示す付随テキストを表示し、その例を、図６Ａ～図６Ｃのグラフィカルユーザインターフェース６００、６０２、および６０４の部分、ならびに図７Ａおよび図７Ｂのグラフィカルユーザインターフェース７００および７０２の部分に示す。 Some implementations show these dimensions in portions of a graphical user interface with corresponding slider affordances. An exemplary graphical user interface 500 having a portion 502 including such affordances according to some implementations is shown in FIG. 5. Some embodiments allow the user to hover the cursor over each affordance to obtain more information about each category. According to some implementations, each slider affordance can be adjusted individually. Some implementations display accompanying text showing similarity details, including how many census tracts are similar to the selected census tract, examples of which are shown in portions of graphical user interfaces 600, 602, and 604 in FIGS. 6A-6C and portions of graphical user interfaces 700 and 702 in FIGS. 7A-7B.

いくつかの実装形態は、上位にランク付けされた類似した国勢統計区を示す表（例えば、図８のグラフィカルユーザインターフェース８００内の表８０２）を表示する。各国勢統計区は個別に選択することができる。いくつかの実装形態は、各国勢統計区または地域にズームインするためのアイコン（例えば、図９のグラフィカルユーザインターフェース９００内のアイコン９０２）を提供する。いくつかの実装形態はまた、ユーザが類似性パーセンテージに基づいてソートすることを可能にし（例えば、類似性の降順でリストをソートすることで、ユーザは、最も類似していないまたは非類似している国勢統計区を見ることができる）、その例を、図１０および図１１に示されているグラフィカルユーザインターフェース１０００および１１００に示す。いくつかの実装形態は、類似性を表示するための複数の地地図形式表現を提供する。この例が図１２に示されており、図１２は、グラフィカルユーザインターフェース１２００における２つのオプション１２０２を示す。いくつかの実装形態は、最大から最小への勾配が示されるコロプレスマップタイプをサポートする（例えば、図１２に示される例示的なマップ）。いくつかの実装形態は、２つのカラーパレット（例えば、それぞれ図１３、図１４、および図１５のグラフィカルユーザインターフェース１３００、１４００、および１５００に示されるマップ）を使用して最大または最小（例えば、上位１００、下位１０、または上位１０００）としてデータが示されるマップタイプをサポートする。いくつかの実装形態は、最も類似しているもの（例えば、赤）および最も類似していないもの（例えば、青）について異なる色コーディングを示す。 Some implementations display a table (e.g., table 802 in graphical user interface 800 of FIG. 8) showing the top-ranked similar census tracts. Each tract can be individually selected. Some implementations provide an icon (e.g., icon 902 in graphical user interface 900 of FIG. 9) to zoom in on each tract or region. Some implementations also allow the user to sort based on similarity percentage (e.g., sorting the list by descending similarity allows the user to see the least similar or dissimilar census tracts), examples of which are shown in graphical user interfaces 1000 and 1100 in FIGS. 10 and 11. Some implementations provide multiple topographical map-style representations for displaying similarity. An example of this is shown in FIG. 12, which shows two options 1202 in graphical user interface 1200. Some implementations support a choropleth map type in which the gradient from maximum to minimum is shown (e.g., the exemplary map shown in FIG. 12). Some implementations support map types in which data is shown as maximum or minimum (e.g., top 100, bottom 10, or top 1000) using two color palettes (e.g., maps shown in graphical user interfaces 1300, 1400, and 1500 in Figures 13, 14, and 15, respectively). Some implementations show different color coding for most similar (e.g., red) and least similar (e.g., blue).

図１６のグラフィカルユーザインターフェース１６００に示すように、いくつかの実装形態は、選択すべき所定のロケーションを示す。いくつかの実装形態は、ユーザがロケーションを追加することを可能にする。いくつかの実装形態は、ユーザがマップ上に多角形を描画することを可能にし、その例を図１７Ａのグラフィカルユーザインターフェース１７００に示し、これは、ユーザが多角形を描画することを可能にするアフォーダンス１７０２を示す。ユーザが多角形を描画することによって（例えば、マップ上の異なる場所を選択することによって）地域を選択すると、特定の地域についての類似性が示される（例えば、類似した国勢統計区が示される）。例示的な多角形１７０６は、いくつかの実装形態にしたがって、図１７Ｂに示されるグラフィカルユーザインターフェース１７０４に示される。いくつかの実装形態は、データがインターフェースの外部で分析され得るように、（例えば、アフォーダンス１７０８をクリックすることによって）データのエクスポートを可能にする。図１８は、いくつかの実装形態による、（上述したように）ユーザが多角形を描写することによって地域を選択した後に更新されるグラフィカルユーザインターフェース１８００を示す。図１９のグラフィカルユーザインターフェース例１９００に示すように、いくつかの実装形態は、ユーザが調整を行った後にマップを印刷することを可能にするアフォーダンス１９０２を提供する。 As shown in graphical user interface 1600 of FIG. 16, some implementations show predefined locations to select. Some implementations allow the user to add locations. Some implementations allow the user to draw a polygon on the map, an example of which is shown in graphical user interface 1700 of FIG. 17A, which shows affordance 1702 allowing the user to draw a polygon. When the user selects a region by drawing a polygon (e.g., by selecting a different location on the map), similarities for the particular region are shown (e.g., similar census tracts are shown). An example polygon 1706 is shown in graphical user interface 1704 shown in FIG. 17B according to some implementations. Some implementations allow export of data (e.g., by clicking affordance 1708) so that the data can be analyzed outside the interface. FIG. 18 shows graphical user interface 1800 that is updated after the user selects a region by drawing a polygon (as described above) according to some implementations. As shown in the example graphical user interface 1900 of FIG. 19, some implementations provide an affordance 1902 that allows the user to print the map after making adjustments.

いくつかの実装形態は、ユーザが、手動でミキサを変更する（例えば、スライダアフォーダンスを調整することによって、個々のカテゴリの値を選択する）代わりに、（例えば、外部文書内の値に基づいて）値を制限することができるように、「プリセット」を提供し、その例を、それぞれ図２１、図２２、および図２３のグラフィカルユーザインターフェース２１００（例えば、部分２１０２）、２２００、および２３００に示す。例示的なＸＭＬファイル２４００は、いくつかの実装形態にしたがって、図２４に例示のために示される。この例は、密度パラメータ（例えば、１平方マイル当たり１００人未満の人口密度）とともに、指定されたロケーション名を示す。いくつかの実装形態は、ユーザがプリセットファイルをアップロードすることを可能にし、その例を、図２５のグラフィカルユーザインターフェース２５００に示し、ここで、プリセットファイル２５０２はユーザによってアップロードされている。いくつかの実装形態は、ユーザがＵＲＬ（例えば、共有用のＵＲＬ２５０４）を使用して外部ファイルを共有することを可能にする。いくつかの実装形態によれば、外部ファイルがアップロードされると、表示は、現在のファイル内の値を反映するように自動的に更新される。いくつかの実装形態による、例示的な更新が図２６のグラフィカルユーザインターフェース２６００に示される。いくつかの実装形態は、ユーザが選択するためのいくつかのプリセットを示す。いくつかの実装形態は、選択されたプリセットに基づいてミキサ（異なるカテゴリの値）を自動的に更新する。いくつかの実装形態は、ユーザが追加のツールチップ詳細を有効にすることを可能にし、その例を、それぞれ図２７および図２８のグラフィカルユーザインターフェース２７００および２８００に示されている。 Some implementations provide "presets" so that users can limit values (e.g., based on values in an external document) instead of manually modifying the mixer (e.g., selecting values for individual categories by adjusting slider affordances), examples of which are shown in graphical user interfaces 2100 (e.g., portion 2102), 2200, and 2300 of Figures 21, 22, and 23, respectively. An exemplary XML file 2400 is shown for illustrative purposes in Figure 24 according to some implementations. This example shows a specified location name along with a density parameter (e.g., a population density of less than 100 people per square mile). Some implementations allow users to upload preset files, an example of which is shown in graphical user interface 2500 of Figure 25, where preset file 2502 has been uploaded by a user. Some implementations allow users to share external files using a URL (e.g., a share URL 2504). According to some implementations, once an external file is uploaded, the display automatically updates to reflect the values in the current file. An example update according to some implementations is shown in graphical user interface 2600 of FIG. 26. Some implementations present several presets for the user to select from. Some implementations automatically update the mixer (values for different categories) based on the selected preset. Some implementations allow the user to enable additional tooltip details, examples of which are shown in graphical user interfaces 2700 and 2800 of FIGS. 27 and 28, respectively.

図２９は、いくつかの実装形態による、上述したグラフィカルユーザインターフェースを表示することができるコンピューティングデバイス２９００を示すブロック図である。コンピューティングデバイス２９００の様々な例には、デスクトップコンピュータ、ラップトップコンピュータ、タブレットコンピュータ、ならびにディスプレイおよびデータ視覚化アプリケーション２９３０を実行することができるプロセッサを有する他のコンピューティングデバイスが含まれる。コンピューティングデバイス２９００は、通常、１つまたは複数の処理ユニット（プロセッサまたはコア）２９０２と、１つまたは複数のネットワークまたは他の通信インターフェース２９０４と、メモリ２９０６と、これらの構成要素を相互接続するための１つまたは複数の通信バス２９０８とを含む。通信バス２９０８は、オプションで、システム構成要素間の通信を相互接続および制御する回路（チップセットと呼ばれることもある）を含む。コンピューティングデバイス２９００は、ユーザインターフェース２９１０を含む。ユーザインターフェース２９１０は、通常、ディスプレイデバイス２１２を含む。いくつかの実装形態では、コンピューティングデバイス２９００は、キーボード、マウス、および／または他の入力ボタン２９１６などの入力デバイスを含む。代替的または追加的に、いくつかの実装形態では、ディスプレイデバイス２９１２はタッチセンシティブ表面２９１４を含み、その場合、ディスプレイデバイス２９１２はタッチセンシティブディスプレイである。いくつかの実装形態では、タッチセンシティブ表面２９１４は、様々なスワイプジェスチャ（例えば、垂直方向および／または水平方向の連続ジェスチャ）および／または他のジェスチャ（例えば、シングルタップ／ダブルタップ）を検出するように構成される。タッチセンシティブディスプレイ２９１４を有するコンピューティングデバイスでは、物理キーボードはオプションである（例えば、キーボード入力が必要なときにはソフトキーボードが表示され得る）。ユーザインターフェース２９１０はまた、スピーカ、またはスピーカ、イヤホン、もしくはヘッドホンに接続されたオーディオ出力接続などのオーディオ出力デバイス２９１８を含む。さらに、いくつかのコンピューティングデバイス２９００は、キーボードを補完または置き換えるためにマイクロフォンおよび音声認識を使用する。オプションで、コンピューティングデバイス２９００は、オーディオ（例えば、ユーザからのスピーチ）をキャプチャするためのオーディオ入力デバイス２９２０（例えば、マイクロフォン）を含む。 FIG. 29 is a block diagram illustrating a computing device 2900 capable of displaying the graphical user interface described above, according to some implementations. Various examples of computing device 2900 include desktop computers, laptop computers, tablet computers, and other computing devices having a processor capable of executing a display and data visualization application 2930. The computing device 2900 typically includes one or more processing units (processors or cores) 2902, one or more network or other communication interfaces 2904, memory 2906, and one or more communication buses 2908 for interconnecting these components. The communication bus 2908 optionally includes circuitry (sometimes referred to as a chipset) that interconnects and controls communication between system components. The computing device 2900 includes a user interface 2910. The user interface 2910 typically includes a display device 212. In some implementations, the computing device 2900 includes input devices such as a keyboard, mouse, and/or other input buttons 2916. Alternatively or additionally, in some implementations, the display device 2912 includes a touch-sensitive surface 2914, in which case the display device 2912 is a touch-sensitive display. In some implementations, the touch-sensitive surface 2914 is configured to detect various swipe gestures (e.g., vertical and/or horizontal continuous gestures) and/or other gestures (e.g., single taps/double taps). In computing devices with a touch-sensitive display 2914, a physical keyboard is optional (e.g., a soft keyboard may be displayed when keyboard input is required). The user interface 2910 also includes an audio output device 2918, such as a speaker or an audio output connection connected to a speaker, earphones, or headphones. Furthermore, some computing devices 2900 use a microphone and voice recognition to supplement or replace a keyboard. Optionally, the computing device 2900 includes an audio input device 2920 (e.g., a microphone) for capturing audio (e.g., speech from a user).

メモリ２９０６は、ＤＲＡＭ、ＳＲＡＭ、ＤＤＲＲＡＭ、または他のランダムアクセスソリッドステートメモリデバイスなどの高速ランダムアクセスメモリを含み、１つまたは複数の磁気記憶ディスクデバイス、光学ディスク記憶デバイス、フラッシュメモリデバイス、または他の不揮発性ソリッドステート記憶デバイスなどの不揮発性メモリを含み得る。いくつかの実装形態では、メモリ２９０６は、プロセッサ（複数可）２９０２から離れて位置する１つまたは複数の記憶デバイスを含む。メモリ２９０６、または代替的にメモリ２９０６内の不揮発性メモリデバイス（複数可）は、非一時的コンピュータ可読記憶媒体を含む。いくつかの実装形態では、メモリ２９０６またはメモリ２９０６のコンピュータ可読記憶媒体は、以下のプログラム、モジュール、およびデータ構造、またはそれらのサブセットもしくはスーパーセットを記憶する：
・様々な基本システムサービスを処理し、ハードウェア依存タスクを実行するためのプロシージャを含むオペレーティングシステム２９２２；
・インターネット、他のワイドエリアネットワーク、ローカルエリアネットワーク、メトロポリタンエリアネットワークなどの１つまたは複数の通信ネットワークインターフェース２９０４（ワイヤードまたはワイヤレス）を介して、コンピューティングデバイス２９００を他のコンピュータおよびデバイスに接続するために使用される通信モジュール２９２４；
・ユーザがネットワークを介してリモートコンピュータまたはデバイスと通信することを可能にする、ウェブブラウザ２９２６（またはウェブページを表示することができる他のアプリケーション）；
・オプションで、オーディオ入力デバイス２９２０によってキャプチャされたオーディオを処理するためのオーディオ入力モジュール２９２８（例えば、マイクロフォンモジュール）。キャプチャされたオーディオは、リモートサーバに送信され、および／またはコンピューティングデバイス２９００上で実行されるアプリケーション（例えば、データ視覚化アプリケーション２９３０）によって処理され得る；
・データ視覚化および関連する特徴を生成するためのデータ視覚化アプリケーション２９３０。アプリケーション２９３０は、ユーザが視覚的グラフィックを構築するためのグラフィカルユーザインターフェース２９３２を含む。例えば、ユーザは、１つまたは複数のデータソース２９４０（コンピューティングデバイス２９００上に記憶され得るか、またはリモートに記憶され得る）を選択し、データソース（複数可）からデータフィールドを選択し、選択されたフィールドを使用して視覚的グラフィックを定義する；ならびに
・データ視覚化アプリケーション２９３０によって使用される、ゼロ以上のデータベースまたはデータソース２９４０（例えば、第１のデータソース２９４０－１および第２のデータソース２９４０－２）。いくつかの実装形態では、データソースは、スプレッドシートファイル、ＣＳＶファイル、テキストファイル、ＪＳＯＮファイル、ＸＭＬファイル、もしくはフラットファイルとして記憶されるか、またはリレーショナルデータベースに記憶される。 Memory 2906 includes high-speed random-access memory such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices, and may include non-volatile memory such as one or more magnetic storage disk devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. In some implementations, memory 2906 includes one or more storage devices located remotely from processor(s) 2902. Memory 2906, or alternatively the non-volatile memory device(s) within memory 2906, includes a non-transitory computer-readable storage medium. In some implementations, memory 2906 or the computer-readable storage medium of memory 2906 stores the following programs, modules, and data structures, or a subset or superset thereof:
An operating system 2922 that contains procedures for handling various basic system services and performing hardware-dependent tasks;
a communications module 2924 used to connect the computing device 2900 to other computers and devices via one or more communications network interfaces 2904 (wired or wireless), such as the Internet, other wide area networks, local area networks, metropolitan area networks, etc.;
A web browser 2926 (or other application capable of displaying web pages) that allows the user to communicate with remote computers or devices over the network;
Optionally, an audio input module 2928 (e.g., a microphone module) for processing audio captured by the audio input device 2920. The captured audio may be transmitted to a remote server and/or processed by an application (e.g., a data visualization application 2930) executing on the computing device 2900;
a data visualization application 2930 for generating data visualizations and associated features. The application 2930 includes a graphical user interface 2932 through which a user builds the visual graphic. For example, a user selects one or more data sources 2940 (which may be stored on the computing device 2900 or may be stored remotely), selects data fields from the data source(s), and defines the visual graphic using the selected fields; and zero or more databases or data sources 2940 (e.g., a first data source 2940-1 and a second data source 2940-2) used by the data visualization application 2930. In some implementations, the data sources are stored as spreadsheet files, CSV files, text files, JSON files, XML files, or flat files, or are stored in a relational database.

いくつかの実装形態では、データ視覚化アプリケーション２９３０は、ユーザ入力（例えば、視覚的仕様２９３６）を取り込み、対応する視覚的グラフィックを生成する、データ視覚化生成モジュール２９３４を含む。次いで、データ視覚化アプリケーション２９３０は、生成された視覚的グラフィックをユーザインターフェース２９３２に表示する。いくつかの実装形態では、データ視覚化アプリケーション２９３０は、独立型アプリケーション（例えば、デスクトップアプリケーション）として実行される。いくつかの実装形態では、データ視覚化アプリケーション２９３０は、ウェブブラウザ２９２６またはウェブサーバによって提供されるウェブページを使用する別のアプリケーション（例えば、サーバベースのアプリケーション）内で実行される。 In some implementations, the data visualization application 2930 includes a data visualization generation module 2934 that takes user input (e.g., visual specification 2936) and generates a corresponding visual graphic. The data visualization application 2930 then displays the generated visual graphic in a user interface 2932. In some implementations, the data visualization application 2930 runs as a stand-alone application (e.g., a desktop application). In some implementations, the data visualization application 2930 runs within a web browser 2926 or another application (e.g., a server-based application) that uses web pages served by a web server.

いくつかの実装形態では、ユーザが提供する情報（例えば、ユーザ入力）は、視覚的仕様２９３６として記憶される。いくつかの実装形態では、視覚的仕様２９３６は、ユーザから受信された以前の自然言語コマンド、または自然言語コマンドを通じてユーザによって指定されたプロパティを含む。 In some implementations, information provided by the user (e.g., user input) is stored as a visual specification 2936. In some implementations, the visual specification 2936 includes previous natural language commands received from the user or properties specified by the user through natural language commands.

いくつかの実装形態では、データ視覚化アプリケーション２９３０は、コンピューティングデバイスのユーザによって提供されるコマンドを処理する（例えば、解釈する）ための言語処理モジュール２９３８を含む。いくつかの実装形態では、コマンドは、（例えば、オーディオ入力デバイス２９２０によってキャプチャされた）自然言語コマンドである。いくつかの実装形態では、言語処理モジュール２９３８は、オートコンプリートモジュール、語用論モジュール、および曖昧さモジュールなどのサブモジュールを含み、そのそれぞれについて以下でさらに詳細に説明する。 In some implementations, the data visualization application 2930 includes a language processing module 2938 for processing (e.g., interpreting) commands provided by a user of the computing device. In some implementations, the commands are natural language commands (e.g., captured by the audio input device 2920). In some implementations, the language processing module 2938 includes sub-modules such as an auto-complete module, a pragmatics module, and an ambiguity module, each of which is described in further detail below.

いくつかの実装形態では、メモリ２９０６は、言語処理モジュール２９３８によって決定されたメトリックおよび／またはスコアを記憶する。加えて、メモリ２９０６は、言語処理モジュール２９３８によって決定されたメトリックおよび／またはスコアと比較されるしきい値および他の基準を記憶し得る。例えば、言語処理モジュール２９３８は、受信したコマンドの分析的な単語／句についての関連性メトリック（詳細に後述する）を決定し得る。次いで、言語処理モジュール２９３８は、関連性メトリックをメモリ２９０６に記憶されたしきい値と比較し得る。 In some implementations, the memory 2906 stores metrics and/or scores determined by the language processing module 2938. In addition, the memory 2906 may store thresholds and other criteria that are compared to the metrics and/or scores determined by the language processing module 2938. For example, the language processing module 2938 may determine a relevance metric (described in more detail below) for analytical words/phrases of the received command. The language processing module 2938 may then compare the relevance metric to a threshold stored in the memory 2906.

コンピューティングデバイス２９００の様々なデータ構造およびモジュールの詳細は、いくつかの実装形態にしたがって、図３４を参照して以下にさらに説明される。 Details of the various data structures and modules of computing device 2900, according to some implementations, are further described below with reference to FIG. 34.

上記で識別された実行可能モジュール、アプリケーション、またはプロシージャのセットの各々は、前述のメモリデバイスのうちの１つまたは複数に記憶され得、上述した機能を実行するための命令のセットに対応する。上記で識別されたモジュールまたはプログラム（すなわち、命令のセット）は、別個のソフトウェアプログラム、プロシージャ、またはモジュールとして実装される必要はなく、したがって、これらのモジュールの様々なサブセットは、様々な実装形態において組み合わせられるか、または場合によっては再構成され得る。いくつかの実装形態では、メモリ２９０６は、上記で識別されたモジュールおよびデータ構造のサブセットを記憶する。さらに、メモリ２９０６は、上述されていない追加のモジュールまたはデータ構造を記憶し得る。 Each of the above-identified executable modules, applications, or sets of procedures may be stored in one or more of the aforementioned memory devices and correspond to sets of instructions for performing the functions described above. The above-identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules; thus, various subsets of these modules may be combined, or possibly reconfigured, in various implementations. In some implementations, memory 2906 stores a subset of the above-identified modules and data structures. Additionally, memory 2906 may store additional modules or data structures not described above.

図２９はコンピューティングデバイス２９００を示すが、図２９は、本明細書で説明される実装形態の構造概略図としてではなく、存在し得る様々な特徴の機能的説明として意図されている。実際には、当業者によって認識されるように、別々に示された項目を組み合わせることができ、いくつかの項目を分離することができる。 While Figure 29 illustrates a computing device 2900, Figure 29 is intended as a functional description of various features that may be present, rather than as a structural schematic of the implementations described herein. In practice, items shown separately may be combined and some items may be separated, as will be recognized by those skilled in the art.

図３４は、いくつかの実装形態による、データの視覚的分析のための例示的な方法３４００のフローチャートを示す。方法３４００は、メモリ２９０６内の１つまたは複数のモジュールを使用して、コンピューティングデバイス２９００によって実行される。ユーザは、データソース（例えば、データソース２９４０－１）を選択する。コンピューティングデバイス２９００は、データソース（例えば、データソース２９４０－１）のユーザ選択（例えば、ユーザインターフェース２９１０を介したユーザ選択）を受信する（３４０２）。それに応答して、コンピューティングデバイス２９００（例えば、データ視覚化一般化モジュール２９３４）は、データソース内のデータの分析のためのグラフィカルユーザインターフェース（例えば、グラフィカルユーザインターフェース２９３２、その様々な例は上述されている）を提示する（３４０４）。データは地理空間データポイントを含む。コンピューティングデバイス２９００はまた、グラフィカルユーザインターフェース内にマップデータ視覚化（例えば、グラフィカルユーザインターフェース３１００内のマップデータ視覚化３１０２）を提示する（３４０６）。マップデータ視覚化は複数の地理的地域（例えば、国勢統計区）を含む。各地理的地域はそれぞれの１つまたは複数の地理空間データポイントに対応する。 Figure 34 shows a flowchart of an exemplary method 3400 for visual analysis of data, according to some implementations. Method 3400 is executed by computing device 2900 using one or more modules in memory 2906. A user selects a data source (e.g., data source 2940-1). Computing device 2900 receives a user selection (e.g., user selection via user interface 2910) of a data source (e.g., data source 2940-1) (3402). In response, computing device 2900 (e.g., data visualization generalization module 2934) presents a graphical user interface (e.g., graphical user interface 2932, various examples of which are described above) for analysis of data in the data source (3404). The data includes geospatial data points. Computing device 2900 also presents a map data visualization (e.g., map data visualization 3102 in graphical user interface 3100) in the graphical user interface (3406). The map data visualization includes multiple geographic regions (e.g., census tracts), each corresponding to one or more respective geospatial data points.

複数の地理的地域のうちの１つまたは複数の地理的地域の第１のセットを選択するための第１のユーザ入力を受信したことに応答して（３４０８）、コンピューティングデバイス２９００は、１つまたは複数の統計的技法を使用して、属性のセット（例えば、選択されたデータソースからのデータフィールド）に基づいて、１つまたは複数の地理的地域の第１のセットと複数の地理的地域のうちの１つまたは複数の地理的地域の第２のセットとの間の類似性を計算する（３４１０）。 In response to receiving a first user input to select a first set of one or more geographic regions from the plurality of geographic regions (3408), the computing device 2900 calculates (3410) a similarity between the first set of one or more geographic regions and a second set of one or more geographic regions from the plurality of geographic regions based on a set of attributes (e.g., data fields from the selected data source) using one or more statistical techniques.

いくつかの実装形態では、属性のセットは、１つまたは複数の社会経済的、人口統計学的、および地理的変数を含む。そのような属性（次元と呼ばれることがある）の例は、例示的なミキサウィジェットを参照して上述されている。例えば、国勢統計区は、複数の次元を含み得る。いくつかの実装形態では、属性のセットの各属性は、複数の重みのうちの対応する重みに関連付けられ、方法は、複数の重みに基づいて類似性を計算することをさらに含む。いくつかの実装形態では、方法３４００は、１つまたは複数のアフォーダンスを提供することをさらに含み、各アフォーダンスは、属性のセットのそれぞれの属性に対応する。いくつかの実装形態では、方法は、１つまたは複数のアフォーダンスのうちの第１のアフォーダンスを選択するための第２のユーザ入力を受信したことに応答して、（ｉ）第１のアフォーダンスに対応する第１の属性に対応する第１の重みを調整して、重みの更新されたセットを取得することと、（ｉｉ）１つまたは複数の統計的技法を使用して、重みの更新されたセットに基づいて、第１の１つまたは複数の地理的地域と第２の１つまたは複数の地理的地域との間の更新された類似性を計算することと、（ｉｉｉ）更新された類似性にしたがってマップデータ視覚化を更新して表示すること（３４１２）とをさらに含む。これらのステップの例は、いくつかの実装形態による、ミキサウィジェットを参照して上述されている。いくつかの実装形態では、方法３４００は、（ｉ）重みの更新されたセットを記憶するためのストアアフォーダンスを提供することと、（ｉｉ）ユーザがストアアフォーダンスを選択したことに応答して、次のセッションのために、重みの更新されたセットをプリセットファイルに記憶することとをさらに含む。いくつかの実装形態では、方法３４００は、次のセッションのために、プリセットファイルを取り出し、１つまたは複数の地理的地域の第１のセットと１つまたは複数の地理的地域の第２のセットとの間の類似性を計算するために重みの更新されたセットを使用することをさらに含む。プリセットファイルの使用の例は、いくつかの実装形態による、プリセットウィジェットを参照して上述されている。 In some implementations, the set of attributes includes one or more socioeconomic, demographic, and geographic variables. Examples of such attributes (sometimes referred to as dimensions) are described above with reference to the exemplary mixer widget. For example, a census tract may include multiple dimensions. In some implementations, each attribute in the set of attributes is associated with a corresponding weight from a plurality of weights, and the method further includes calculating the similarity based on the plurality of weights. In some implementations, the method 3400 further includes providing one or more affordances, each affordance corresponding to a respective attribute in the set of attributes. In some implementations, the method further includes, in response to receiving a second user input to select a first affordance of the one or more affordances, (i) adjusting first weights corresponding to the first attribute corresponding to the first affordance to obtain an updated set of weights, (ii) calculating, using one or more statistical techniques, an updated similarity between the first one or more geographic regions and the second one or more geographic regions based on the updated set of weights, and (iii) updating and displaying the map data visualization according to the updated similarity (3412). Examples of these steps are described above with reference to the mixer widget according to some implementations. In some implementations, method 3400 further includes (i) providing a store affordance for storing the updated set of weights, and (ii) storing the updated set of weights in a preset file for a next session in response to the user selecting the store affordance. In some implementations, method 3400 further includes retrieving the preset file and using the updated set of weights to calculate similarity between the first set of one or more geographic regions and the second set of one or more geographic regions for a next session. Examples of the use of the preset file are described above with reference to the preset widget, according to some implementations.

いくつかの実装形態では、方法３４００は、マップデータ視覚化上の検索多角形の座標を選択するための第２のユーザ入力を受信したことに応答して、座標に基づいて１つまたは複数の地域の第２のセットを定義することをさらに含む。いくつかの実装形態では、方法は、１つまたは複数の地理的地域の第２のセットを識別するために、検索多角形の座標を、複数の地理的地域のうちの地理的地域の各々についての対応する地理空間データポイントと比較することをさらに含む。検索多角形（描画多角形と呼ばれることがある）を使用するための例は、いくつかの実装形態による、図１７Ａ、図１７Ｂ、図２０Ａおよび図２０Ｂを参照しながら上述されている。 In some implementations, the method 3400 further includes, in response to receiving a second user input for selecting coordinates of the search polygon on the map data visualization, defining a second set of one or more regions based on the coordinates. In some implementations, the method further includes comparing the coordinates of the search polygon with corresponding geospatial data points for each of the geographic regions of the plurality of geographic regions to identify the second set of one or more geographic regions. Examples for using search polygons (sometimes referred to as drawing polygons) are described above with reference to Figures 17A, 17B, 20A, and 20B, according to some implementations.

図３４に戻って参照すると、コンピューティングデバイス２９００は、計算された類似性にしたがってマップデータ視覚化を更新して表示する（３４１２）。例えば、データ視覚化生成モジュール２９３４は、前のステップで計算された類似性にしたがってマップデータ視覚化を更新して表示し得る。いくつかの実装形態では、マップデータ視覚化を更新することは、第２の１つまたは複数の地理的地域のうちの少なくとも１つの地理的地域をハイライト表示またはローライト表示することを含む。そのようなハイライト表示またはローライト表示の例は、いくつかの実装形態にしたがって、図４を参照して上述されている。 Referring back to FIG. 34, the computing device 2900 updates and displays the map data visualization according to the calculated similarity (3412). For example, the data visualization generation module 2934 may update and display the map data visualization according to the similarity calculated in the previous step. In some implementations, updating the map data visualization includes highlighting or lowlighting at least one geographic region of the second one or more geographic regions. Examples of such highlighting or lowlighting are described above with reference to FIG. 4, according to some implementations.

いくつかの実装形態では、マップデータ視覚化は、コロプレスマップを含み、計算された類似性にしたがってマップデータ視覚化を更新して表示することは、最大類似性から最小類似性への勾配を表示することを含む。いくつかの実装形態では、方法３４００は、（ｉ）コロプレスマップを選択するための第１のアフォーダンスと、最大－最小マップを選択するための第２のアフォーダンスとを提供することと、（ｉｉ）第１のアフォーダンスのユーザ選択に応答して、最大類似性から最小類似性への勾配を表示することと、（ｉｉｉ）第２のアフォーダンスのユーザ選択に応答して、最も類似している地域および最も類似していない地域を表示することとをさらに含む。 In some implementations, the map data visualization includes a choropleth map, and updating and displaying the map data visualization according to the calculated similarity includes displaying a gradient from maximum similarity to minimum similarity. In some implementations, method 3400 further includes (i) providing a first affordance for selecting the choropleth map and a second affordance for selecting the max-min map; (ii) displaying the gradient from maximum similarity to minimum similarity in response to user selection of the first affordance; and (iii) displaying the most similar and least similar regions in response to user selection of the second affordance.

いくつかの実装形態では、方法３４００は、（ｉ）複数のアフォーダンスを提供することであって、各アフォーダンスはそれぞれの最大数の地域に対応する、提供することと、（ｉｉ）複数のアフォーダンスのうちの１つのアフォーダンスのユーザ選択に応答して、アフォーダンスに対応する地域の最大数に基づいて、１つまたは複数の地域の第２のセット内の最も類似している地域および最も類似していない地域を表示することとをさらに含む。 In some implementations, method 3400 further includes (i) providing a plurality of affordances, each affordance corresponding to a respective maximum number of regions; and (ii) in response to a user selection of one of the plurality of affordances, displaying a most similar region and a least similar region within a second set of one or more regions based on the maximum number of regions corresponding to the affordance.

いくつかの実装形態では、方法３４００は、（ｉ）複数のアフォーダンスを提供することであって、各アフォーダンスは複数の小地域のうちの小地域のそれぞれのサブセットに対応する、提供することと、（ｉｉ）複数のアフォーダンスのうちのアフォーダンスのユーザ選択に応答して、（ａ）マップデータ視覚化の提示を中止することと、（ｂ）グラフィカルユーザインターフェース内に代替のマップデータ視覚化を提示することとをさらに含む。代替のマップデータ視覚化は、アフォーダンスに対応する小地域のサブセットを含む。 In some implementations, the method 3400 further includes (i) providing a plurality of affordances, each affordance corresponding to a respective subset of the sub-regions of the plurality of sub-regions, and (ii) in response to a user selection of an affordance of the plurality of affordances, (a) ceasing presentation of the map data visualization and (b) presenting an alternative map data visualization within the graphical user interface. The alternative map data visualization includes the subset of the sub-regions corresponding to the affordances.

いくつかの実装形態では、グラフィカルユーザインターフェースは第１の部分および第２の部分を含む。方法３４００は、マップデータ視覚化を第１の部分に表示することと、第１の１つまたは複数の地理的地域と第２の１つまたは複数の地理的地域との間の類似性の要約を第２の部分に表示することとをさらに含む。例えば、図３１において、グラフィカルユーザインターフェース３１００は、マップデータ視覚化３１０２を示すための第１の部分と、要約を示すためのＧとラベル付けされた第２の部分とを含む。 In some implementations, the graphical user interface includes a first portion and a second portion. The method 3400 further includes displaying a map data visualization in the first portion and displaying a summary of the similarities between the first one or more geographic regions and the second one or more geographic regions in the second portion. For example, in FIG. 31 , the graphical user interface 3100 includes a first portion for showing the map data visualization 3102 and a second portion labeled G for showing the summary.

いくつかの実装形態では、類似性を計算することは、属性のセットについて、複数の地理的地域のうちの第１の１つまたは複数の地理的地域および第２の１つまたは複数の地理的地域についての意味的類似性行列を計算することを含む。 In some implementations, calculating the similarity includes calculating a semantic similarity matrix for a first one or more geographic regions and a second one or more geographic regions of the plurality of geographic regions for the set of attributes.

いくつかの実装形態では、類似性を計算することは、第１の１つまたは複数の地理的地域および第２の１つまたは複数の地理的地域の地理的地域のペア間のジェンセンシャノン情報量（ＪＳＤ）を計算することを含む。 In some implementations, calculating the similarity includes calculating Jensen-Shannon Divergence (JSD) between pairs of geographic regions of the first one or more geographic regions and the second one or more geographic regions.

このようにして、本明細書で説明される技法は、分析ワークフロー中に地理的地域の類似性を決定するためのユーザ駆動型アプローチをサポートする。ユーザは、関心のある任意のロケーションを選択することができ、システムは、類似したロケーションおよび類似していないロケーションを識別することを目的として、その地域の社会経済的および人口統計学的特性を、所与の行政区画内の他のすべての地理的地域におけるこれらの特性と比較する。いくつかの実装形態は、ユーザが、将来の分析のためにプリセットファイルとして保存することができる類似性モデルのパラメータを調整することを可能にする。いくつかの実装形態は、ユーザが比較している地理的特徴に対してユーザが関連性判断を行うのを助けるために、場所類似性を探索するための直感的で構成可能なアフォーダンスを提供する。 In this way, the techniques described herein support a user-driven approach to determining geographic region similarity during an analytical workflow. A user can select any location of interest, and the system compares the socioeconomic and demographic characteristics of that location with those characteristics in all other geographic regions within a given administrative division, with the goal of identifying similar and dissimilar locations. Some implementations allow users to adjust parameters of the similarity model, which can be saved as a preset file for future analysis. Some implementations provide intuitive, configurable affordances for exploring location similarity to help users make relevance judgments for the geographic features they are comparing.

本明細書の発明の説明に使用される用語は、特定の実装形態のみを説明するためのものであり、本発明を限定するものではない。本発明の説明および添付の特許請求の範囲において使用される場合、単数形「a」、「an」、および「the」は、文脈がそうでないことを明確に示さない限り、複数形も含むことが意図される。本明細書で使用される「および／または」という用語は、関連する列挙された項目のうちの１つまたは複数の項目のありとあらゆる可能な組合せを指すことおよび包含することも理解されよう。「備える／含む（comprises）」および／または「備えている／含んでいる（comprising）」という用語は、本明細書で使用されるとき、述べられた特徴、ステップ、動作、要素、および／または構成要素の存在を指定するが、１つまたは複数の他の特徴、ステップ、動作、要素、構成要素、および／またはそれらのグループの存在または追加を排除しないことはさらに理解されよう。 The terminology used in the description of the invention herein is intended to describe particular implementations only and is not intended to limit the invention. As used in the description of the invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. The term "and/or," as used herein, will also be understood to refer to and include any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and/or "comprising," as used herein, specify the presence of stated features, steps, operations, elements, and/or components, but do not exclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.

前述の説明は、説明の目的のために、特定の実装を参照して説明されている。しかしながら、上記の例示的な説明は、網羅的であること、または本発明を開示された厳密な形態に限定することを意図するものではない。上記の教示に鑑みて、多くの修正および変形が可能である。実装形態は、本発明の原理およびその実際の適用例を最もよく説明し、それによって、他の当業者が、企図される特定の使用に適するように様々な修正を伴って本発明および様々な実装形態を最もよく利用することを可能にするために選択され、説明された。 The foregoing description has been described with reference to specific implementations for purposes of explanation. However, the illustrative description above is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teachings. The implementations have been chosen and described to best explain the principles of the invention and its practical application, and to thereby enable others skilled in the art to best utilize the invention and its various implementations, with various modifications as suited to the particular uses contemplated.

Claims

1. A method of visual analysis of a dataset performed by a computing system having one or more processors and a memory storing one or more programs configured to be executed by the one or more processors , the method comprising:
receiving a user selection of a data source;
presenting a graphical user interface for analysis of data in the data source, the data including geospatial data points;
presenting a map data visualization within the graphical user interface, the map data visualization including a plurality of geographic regions, each geographic region corresponding to a respective one or more geospatial data points;
in response to receiving a first user input to select a first set of one or more geographic regions of the plurality of geographic regions;
calculating, using one or more statistical techniques, a similarity between the first set of one or more geographic regions and a second set of one or more geographic regions of the plurality of geographic regions based on a set of data fields from the data sources;
and updating and displaying the map data visualization according to the calculated similarity.

The method of claim 1, wherein the set of data fields includes one or more socioeconomic, demographic, and geographic variables.

The method of claim 1, wherein updating the map data visualization includes highlighting or lowlighting at least one geographic region of the second set of one or more geographic regions.

in response to receiving a second user input selecting coordinates of a search polygon on the map data visualization;
The method of claim 1 , further comprising: defining a second set of the one or more geographic regions based on the coordinates.

5. The method of claim 4, further comprising: comparing the coordinates of the search polygon to corresponding geospatial data points for each of the geographic regions of the plurality of geographic regions to identify a second set of the one or more geographic regions.

Each data field of the set of data fields is associated with a corresponding weight from a plurality of weights, and the method comprises:
The method of claim 1 , further comprising: calculating the similarity based on the plurality of weights.

The method of claim 6 , further comprising: providing one or more affordances, each affordance corresponding to a respective data field of the set of data fields.

in response to receiving a second user input to select a first affordance of the one or more affordances;
adjusting a first weight corresponding to a first attribute corresponding to the first affordance to obtain an updated set of weights;
calculating, using the one or more statistical techniques, updated similarities between the first set of one or more geographic regions and the second set of one or more geographic regions based on the updated set of weights;
and updating and displaying the map data visualization according to the updated similarity.

providing a store affordance for storing the updated set of weights;
In response to a user selecting the store affordance,
and storing the updated set of weights in a preset file for a next session.

10. The method of claim 9, further comprising: retrieving the preset file and using the updated set of weights to calculate the similarity between the first set of one or more geographic regions and the second set of one or more geographic regions for the next session.

The method of claim 1, wherein the map data visualization includes a choropleth map, and the step of updating and displaying the map data visualization according to the calculated similarity includes displaying a gradient from maximum similarity to minimum similarity.

providing a first affordance for selecting a choropleth map and a second affordance for selecting a max-min map;
displaying a gradient from maximum similarity to minimum similarity in response to a user selection of the first affordance;
and displaying the most similar and least similar regions in response to a user selection of the second affordance.

providing a plurality of affordances, each affordance corresponding to a respective maximum number of regions;
2. The method of claim 1 , further comprising: in response to a user selection of one affordance of the plurality of affordances, displaying a most similar region and a least similar region in the second set of one or more geographic regions based on a maximum number of regions corresponding to the affordance.

providing a plurality of affordances, each affordance corresponding to a respective subset of sub-regions of the plurality of sub-regions;
in response to a user selection of an affordance from the plurality of affordances;
ceasing the presentation of the map data visualization;
10. The method of claim 1, further comprising: presenting an alternative map data visualization within the graphical user interface, the alternative map data visualization including a subset of the subregion corresponding to the affordance.

The graphical user interface includes a first portion and a second portion, and the method includes:
displaying the map data visualization in the first portion;
and displaying in the second portion a summary of the similarities between the first set of one or more geographic regions and the second set of one or more geographic regions.

The method of claim 1, wherein each of the geographic regions corresponds to a respective census tract.

The method of claim 1, wherein the step of calculating the similarity includes calculating, for the set of data fields, a semantic similarity matrix for the first set of one or more geographic regions and the second set of one or more geographic regions of the plurality of geographic regions.

The method of claim 1, wherein the step of calculating the similarity includes calculating Jensen-Shannon Divergence (JSD) between pairs of geographic regions in the first set of one or more geographic regions and the second set of one or more geographic regions.

1. A computer system for visual analysis of a dataset, comprising:
one or more processors;
memory and
20. A computer system, wherein the memory stores one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for performing the method of any one of claims 1 to 18.

A program comprising instructions that, when executed by a computer system having a display, one or more processors, and memory, cause the computer system to perform the method of any one of claims 1 to 18.