JP2018155997A

JP2018155997A - Apparatus for retrieving facilities

Info

Publication number: JP2018155997A
Application number: JP2017054286A
Authority: JP
Inventors: ジョン　マクガバン; Mcgovern John; マクガバンジョン
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2017-03-21
Filing date: 2017-03-21
Publication date: 2018-10-04

Abstract

【課題】施設名を対象に認識率を向上させることができる施設検索装置を提供すること。【解決手段】ナビゲーション装置１００の音声入力部４２は、利用者によって発声された施設名に対して音声認識処理を行い、複数の施設名のそれぞれに対応する類似度を発音スコアとして算出する発音スコア算出部４６と、複数の施設名のそれぞれに対応する属性情報としてのチェーン店の店舗数に基づいて算出される店舗数スコアと発音スコアとに基づいて、複数の施設名のそれぞれに対応する尤度としての総合スコアを算出する総合スコア算出部５０と、算出された総合スコアに基づいて、利用者によって発声された施設名に対応する音声認識結果を出力する認識結果出力部５２とを備えている。【選択図】図１To provide a facility search apparatus capable of improving the recognition rate for a facility name. A voice input unit of a navigation device performs voice recognition processing on a facility name uttered by a user and calculates a similarity score corresponding to each of a plurality of facility names as a pronunciation score. The likelihood corresponding to each of a plurality of facility names based on the store number score calculated based on the calculation unit 46 and the number of stores of the chain store as attribute information corresponding to each of the plurality of facility names and the pronunciation score. A total score calculation unit 50 that calculates a total score as a degree, and a recognition result output unit 52 that outputs a voice recognition result corresponding to the facility name uttered by the user based on the calculated total score. Yes. [Selection] Figure 1

Description

本発明は、車両等に搭載されて、音声入力により対応施設の抽出を行う施設検索装置に関する。 The present invention relates to a facility search apparatus that is mounted on a vehicle or the like and extracts a corresponding facility by voice input.

従来から、利用者が発声した地名に対して特徴量を抽出し、音声認識辞書に登録された複数の認識対象語彙に対する尤度を算出した後、地名の下位階層に対応するストリート名などの情報量に基づいて設定された重み係数を乗算して重み付き尤度を算出することにより、音声認識結果を得るようにした音声認識装置が知られている（例えば、特許文献１参照。）。例えば、多くのストリートが属する都市名を利用者が発声した場合には、この都市名に対応する本来の尤度よりも高い重み付き尤度が算出されるため、この都市名を音声認識結果として得る可能性が高くなる。 Conventionally, after extracting features for place names spoken by the user and calculating the likelihood for multiple recognition target words registered in the speech recognition dictionary, information such as street names corresponding to the lower hierarchy of place names There is known a voice recognition apparatus that obtains a voice recognition result by multiplying a weighting coefficient set based on a quantity to calculate a weighted likelihood (see, for example, Patent Document 1). For example, when a user utters a city name to which many streets belong, since a weighted likelihood higher than the original likelihood corresponding to this city name is calculated, this city name is used as a speech recognition result. The chances of getting higher.

特許第５１９９３９１号公報Japanese Patent No. 5199391

ところで、上述した特許文献１に開示された音声認識装置では、階層関係を示すツリー構造を有する地名を音声認識の対象としており、ツリー構造を有しない名称に対しては重み係数を用いた音声認識処理を行うことができないという問題があった。例えば、利用者が発声した店舗名等の施設名に対して音声認識処理を行う場合を考えると、一般にはこの施設名にはツリー構造が存在しないため、重み係数を用いた音声認識処理を行って認識率を向上させることができない。 By the way, in the speech recognition apparatus disclosed in Patent Document 1 described above, place names having a tree structure indicating a hierarchical relationship are targeted for speech recognition, and speech recognition using a weighting coefficient for names that do not have a tree structure. There was a problem that processing could not be performed. For example, considering the case where speech recognition processing is performed for a facility name such as a store name uttered by a user, generally, the facility name does not have a tree structure, so speech recognition processing using a weighting coefficient is performed. The recognition rate cannot be improved.

本発明は、このような点に鑑みて創作されたものであり、その目的は、施設名の認識率を向上させることができる施設検索装置を提供することにある。 The present invention has been created in view of such a point, and an object thereof is to provide a facility search apparatus capable of improving the recognition rate of facility names.

上述した課題を解決するために、本発明の施設検索装置は、利用者によって発声された施設名に対して音声認識処理を行い、複数の認識対象語彙のそれぞれに対応する類似度を算出する類似度算出手段と、複数の認識対象語彙のそれぞれに対応する属性情報と類似度とに基づいて、複数の認識対象語彙のそれぞれに対応する尤度を算出する尤度算出手段と、尤度算出手段によって算出された尤度に基づいて、利用者によって発声された施設名に対応する音声認識結果を出力する認識結果出力手段とを備えている。 In order to solve the above-described problem, the facility search apparatus of the present invention performs a speech recognition process on a facility name uttered by a user, and calculates a similarity corresponding to each of a plurality of recognition target vocabularies. A likelihood calculating means, a likelihood calculating means for calculating a likelihood corresponding to each of the plurality of recognition target words based on the attribute information and similarity corresponding to each of the plurality of recognition target words, and a likelihood calculating means And a recognition result output means for outputting a speech recognition result corresponding to the facility name uttered by the user based on the likelihood calculated by the above.

発声した施設名と複数の認識対象語彙との類似度だけでなく、各認識対象語彙に対応する属性情報を組み合わせることにより、複数の認識対象語彙の間に、音声による認識結果とは別に、選択される度合いに差をつけることができるため、利用者が発声した音声のみに基づいて認識結果を得る場合に比べて施設名の認識率を向上させることができる。 In addition to the similarity between the name of the facility that spoke and multiple vocabulary to be recognized, attribute information corresponding to each vocabulary to be recognized is combined between multiple vocabulary to be recognized separately from the speech recognition results. Therefore, the facility name recognition rate can be improved as compared with the case where the recognition result is obtained based only on the voice uttered by the user.

また、上述した認識結果出力手段は、尤度算出手段によって算出された尤度が高い順に所定数の認識対象語彙が示す施設名を、最も尤度が高い施設名が最も上位に配置されるように、音声認識結果として表示することが望ましい。これにより、最も上位に配置された施設名が認識結果として正しい可能性が高くなるため、正しい施設名が最も目につきやすくなるとともに、この施設名を選択する場合の操作が容易となる。 In addition, the recognition result output unit described above is arranged such that the facility name indicated by the predetermined number of recognition target words in the descending order of the likelihood calculated by the likelihood calculating unit is the highest likelihood facility name. In addition, it is desirable to display it as a voice recognition result. Thereby, since the possibility that the name of the facility arranged at the top is correct as the recognition result is high, the correct facility name is most easily noticeable, and the operation for selecting this facility name is facilitated.

また、上述した認識結果出力手段は、所定数の認識対象語彙が示す施設名を複数ページに分けて、最も尤度が高い施設名が最前のページに含まれるように音声認識結果を表示することが望ましい。これにより、最も尤度が高い施設名が２ページ以降に配置されることがなくなるため、この施設名を選択するためのページ送り等の煩雑な操作が不要となる。 Further, the above-described recognition result output means divides the facility name indicated by a predetermined number of recognition target words into a plurality of pages, and displays the speech recognition result so that the most likely facility name is included in the front page. Is desirable. As a result, the facility name with the highest likelihood is not placed on the second and subsequent pages, and a complicated operation such as page turning for selecting the facility name becomes unnecessary.

また、上述した尤度算出手段は、類似度に対応する第１の値と属性情報に対応する第２の値のそれぞれに対して所定の重み付けを行って足し合わせることにより尤度を算出することが望ましい。これにより、尤度における属性情報による寄与度を調整して、音声認識結果として特定の施設が優先的に得られるようにすることができる。 The likelihood calculating means described above calculates the likelihood by adding a predetermined weight to each of the first value corresponding to the similarity and the second value corresponding to the attribute information. Is desirable. Thereby, the contribution degree by the attribute information in likelihood can be adjusted, and a specific facility can be obtained preferentially as a voice recognition result.

また、上述した属性情報は、複数の認識対象語彙のそれぞれに対応する施設名の知名度に相関のある情報であることが望ましい。あるいは、上述した属性情報は、複数の認識対象語彙のそれぞれに対応する施設がチェーン店の店舗である場合に、このチェーン店に属する店舗数を示す情報であることが望ましい。あるいは、上述した属性情報は、複数の認識対象語彙のそれぞれに対応する施設が、全国的に有名であることを示す情報、および／または、全国を複数に区分した複数の地方のいずれかにおいて有名であることを示す情報であることが望ましい。一般に、利用者が発声した音声に類似する複数の施設が存在する場合に、知名度（店舗数が多いチェーン店の場合や、全国や一地方で有名であることが分かっている場合など）に基いて尤度を高くすることができるため、認識率を向上させることができる。 Moreover, it is desirable that the attribute information described above is information correlated to the degree of familiarity of the facility name corresponding to each of the plurality of recognition target vocabularies. Alternatively, the attribute information described above is preferably information indicating the number of stores belonging to a chain store when the facility corresponding to each of the plurality of recognition target vocabularies is a store of the chain store. Alternatively, the attribute information described above is information that indicates that facilities corresponding to each of a plurality of recognition target vocabularies are famous nationwide, and / or is famous in any of a plurality of regions that divide the country into a plurality of regions. It is desirable that the information indicates that it is. In general, when there are multiple facilities similar to the voice spoken by the user, it is based on the name recognition (in the case of a chain store with a large number of stores or when it is known to be famous in the whole country or one region). In addition, since the likelihood can be increased, the recognition rate can be improved.

一実施形態のナビゲーション装置の詳細構成を示す図である。It is a figure which shows the detailed structure of the navigation apparatus of one Embodiment. 入力音声に対して音声認識処理を行って一の施設を抽出する動作手順を示す流れ図である。It is a flowchart which shows the operation | movement procedure which performs the speech recognition process with respect to an input audio | voice and extracts one facility. 発音スコアと店舗数スコアに基づいて算出した総合スコアの具体例を示す図である。It is a figure which shows the specific example of the comprehensive score calculated based on the pronunciation score and the store number score. 認識結果の表示例を示す図である。It is a figure which shows the example of a display of a recognition result. 発音スコアと知名度スコアに基づいて算出した総合スコアの具体例を示す図である。It is a figure which shows the specific example of the total score computed based on the pronunciation score and the famousness score. 発音スコアと店舗数スコア、知名度スコアに基づいて算出した総合スコアの具体例を示す図である。It is a figure which shows the specific example of the total score computed based on the pronunciation score, the store number score, and the famousness score.

以下、本発明の施設検索装置を適用した一実施形態のナビゲーション装置について、図面を参照しながら説明する。 Hereinafter, a navigation device according to an embodiment to which a facility search device of the present invention is applied will be described with reference to the drawings.

図１は、一実施形態のナビゲーション装置１００の詳細構成を示す図である。図１に示すように、ナビゲーション装置１００は、ナビゲーションコントローラ１、地図データ記憶装置２、操作部３、車両位置検出部４、表示装置５、オーディオ部６、マイクロホン７を含んで構成されている。このナビゲーション装置１００は、車両に搭載されている。。 FIG. 1 is a diagram illustrating a detailed configuration of a navigation device 100 according to an embodiment. As shown in FIG. 1, the navigation device 100 includes a navigation controller 1, a map data storage device 2, an operation unit 3, a vehicle position detection unit 4, a display device 5, an audio unit 6, and a microphone 7. This navigation device 100 is mounted on a vehicle. .

ナビゲーションコントローラ１は、ＣＰＵ、ＲＯＭ、ＲＡＭ等を用いて所定の動作プログラムを実行することにより、自車位置周辺の地図画像表示動作、施設検索動作、出発地と目的地とを結ぶ走行経路を設定する経路探索処理やこの走行経路に沿って車両の走行を誘導する経路誘導動作などの各種機能を実現する。ナビゲーションコントローラ１の詳細構成については後述する。 The navigation controller 1 executes a predetermined operation program using a CPU, ROM, RAM, etc., thereby setting a map image display operation around the vehicle position, a facility search operation, and a travel route connecting the departure point and the destination. Various functions such as a route search process to be performed and a route guidance operation for guiding the traveling of the vehicle along the traveling route are realized. The detailed configuration of the navigation controller 1 will be described later.

地図データ記憶装置２は、地図データが格納されている記憶媒体およびその読み取り装置である。この地図データには、地図表示や経路探索・誘導などを行う際に用いられる地図データや、施設検索に必要な検索データなどが含まれている。この検索データには、音声によって施設名を入力する際の音声認識処理に必要な音声認識辞書や、各施設ごとの施設名や所在地、カテゴリ、その他の詳細情報を有する施設ＤＢ（データベース）などが含まれている。 The map data storage device 2 is a storage medium storing map data and a reading device thereof. This map data includes map data used for map display, route search / guidance, search data necessary for facility search, and the like. This search data includes a speech recognition dictionary necessary for speech recognition processing when inputting a facility name by voice, a facility DB (database) having facility names, locations, categories, and other detailed information for each facility. include.

地図データ記憶装置２には、経度および緯度で適当な大きさに区切られた矩形形状の図葉を単位とした地図データが格納されている。各図葉の地図データは、図葉番号を指定することにより特定され、読み出すことが可能となる。地図データ記憶装置２は、ハードディスク装置や半導体メモリによって、あるいは、ＤＶＤとその読み取り装置によって実現される。また、地図データ記憶装置２を通信装置に置き換えて、外部の地図配信サーバ（図示せず）から地図データを取得するようにしてもよい。 The map data storage device 2 stores map data in units of rectangular figure leaves divided into appropriate sizes by longitude and latitude. The map data of each leaf can be specified and read by designating the leaf number. The map data storage device 2 is realized by a hard disk device or a semiconductor memory, or by a DVD and its reading device. Further, the map data storage device 2 may be replaced with a communication device to acquire map data from an external map distribution server (not shown).

操作部３は、利用者の指示（操作）を受け付けるためのものであり、各種の操作ボタンや操作つまみ類を備えている。また、操作部３は、表示装置５の画面に取り付けられたタッチパネルを含んでおり、画面上の一部を直接利用者が指等で指し示すことにより、操作指示を行うことができるようになっている。車両位置検出部４は、例えば、ＧＰＳ受信機、方位センサ、距離センサなどを備えており、所定のタイミングで車両位置（経度、緯度）の検出を行い、検出結果を出力する。 The operation unit 3 is for receiving user instructions (operations), and includes various operation buttons and operation knobs. In addition, the operation unit 3 includes a touch panel attached to the screen of the display device 5, and an operation instruction can be performed when a user directly points a part on the screen with a finger or the like. Yes. The vehicle position detection unit 4 includes, for example, a GPS receiver, an orientation sensor, a distance sensor, and the like, detects the vehicle position (longitude, latitude) at a predetermined timing, and outputs the detection result.

表示装置５は、例えばＬＣＤ（液晶表示装置）によって構成されており、ナビゲーションコントローラ１から出力される映像信号に基づいて自車位置周辺の地図画像などを表示する。オーディオ部６は、ナビゲーションコントローラ１から入力される音声信号に基づいて生成した案内音声等を車室内に出力する。マイクロホン７は、施設名等の音声入力を行うためのものであり、利用者の音声を集音する。 The display device 5 is configured by, for example, an LCD (liquid crystal display device), and displays a map image around the vehicle position based on the video signal output from the navigation controller 1. The audio unit 6 outputs a guidance voice or the like generated based on the voice signal input from the navigation controller 1 into the vehicle interior. The microphone 7 is used to input a voice such as a facility name, and collects a user's voice.

次に、ナビゲーションコントローラ１の詳細構成について説明する。図１に示すナビゲーションコントローラ１は、地図バッファ１０、地図読出制御部１２、地図描画部１４、車両位置計算部２０、経路探索処理部３０、経路誘導処理部３２、目的地設定部３４、施設検索部４０、入力処理部６０、表示処理部７０を含んで構成されている。 Next, a detailed configuration of the navigation controller 1 will be described. The navigation controller 1 shown in FIG. 1 includes a map buffer 10, a map readout control unit 12, a map drawing unit 14, a vehicle position calculation unit 20, a route search processing unit 30, a route guidance processing unit 32, a destination setting unit 34, and a facility search. Unit 40, input processing unit 60, and display processing unit 70.

地図バッファ１０は、地図データ記憶装置２から読み出された地図データや検索データを一時的に格納する。この地図データには、地図画像描画に必要なデータや、経路探索および経路誘導に必要なデータなどが少なくとも含まれる。地図読出制御部１２は、車両位置計算部２０により算出される車両位置や利用者が操作部３を操作して指定した位置に応じて、所定範囲の地図データの読み出し要求を地図データ記憶装置２に出力する。地図描画部１４は、地図バッファ１０に格納された通常地図データに基づいて、表示装置５に地図画像を表示するために必要な描画処理を行って地図描画データを作成する。 The map buffer 10 temporarily stores map data and search data read from the map data storage device 2. This map data includes at least data necessary for drawing a map image and data necessary for route search and route guidance. The map reading control unit 12 sends a map data reading request within a predetermined range in accordance with the vehicle position calculated by the vehicle position calculating unit 20 or the position designated by the user by operating the operation unit 3. Output to. Based on the normal map data stored in the map buffer 10, the map drawing unit 14 performs drawing processing necessary to display a map image on the display device 5 and creates map drawing data.

車両位置計算部２０は、車両位置検出部４から出力される検出データに基づいて自車位置を計算するとともに、計算した自車位置が地図データの道路上にない場合には、自車位置を修正するマップマッチング処理を行う。 The vehicle position calculation unit 20 calculates the vehicle position based on the detection data output from the vehicle position detection unit 4, and if the calculated vehicle position is not on the road of the map data, the vehicle position calculation unit 20 calculates the vehicle position. Perform map matching processing to correct.

経路探索処理部３０は、出発地から目的地までが最少のコストとなる走行経路（誘導経路）を、所定の探索条件を用いた経路探索処理によって算出する。経路誘導処理部３２は、経路探索処理部３０による探索処理によって得られた走行経路を地図上に重ねて表示したり、右左折交差点の案内図を表示するための誘導経路描画データを作成するとともに、走行経路に沿って車両の走行を誘導するために必要な交差点案内等の音声信号を生成する（経路誘導動作）。目的地設定部３４は、経路探索処理部３０によって行われる経路探索処理に用いられる目的地を設定する。 The route search processing unit 30 calculates a travel route (guidance route) having a minimum cost from the departure point to the destination by route search processing using a predetermined search condition. The route guidance processing unit 32 creates guidance route drawing data for displaying the travel route obtained by the search processing by the route search processing unit 30 so as to be superimposed on the map or displaying the guidance map of the right-left turn intersection. A voice signal such as intersection guidance necessary for guiding the vehicle to travel along the travel route is generated (route guidance operation). The destination setting unit 34 sets a destination used for the route search processing performed by the route search processing unit 30.

施設検索部４０は、利用者によって指定された施設を抽出するとともに、この抽出した施設の詳細情報を検索する。利用者による施設の指定は、音声入力によって行う場合をここでは想定している。この施設検索部４０は、音声入力部４２、施設選択部５４、施設ＤＢ（データベース）５６、詳細情報検索部５８を含んでいる。 The facility search unit 40 extracts a facility designated by the user and searches for detailed information on the extracted facility. Here, it is assumed that the facility is designated by the user by voice input. The facility search unit 40 includes a voice input unit 42, a facility selection unit 54, a facility DB (database) 56, and a detailed information search unit 58.

音声入力部４２は、利用者が音声によって施設名を入力すると、この入力音声に対して音声認識処理を行って、候補となる複数の施設名を抽出する。抽出された複数の施設名は、音声認識結果として表示処理部７０に入力され、表示装置５に表示される。施設選択部５４は、音声入力部４２によって抽出された複数の施設名の中から、利用者の指示に応じて一つを選択する。施設ＤＢ５６は、施設の詳細情報を格納する。例えば、施設のカテゴリ（業種）や営業内容などや、チェーン店の店舗である場合にはそのチェーン店を識別するためのチェーンＩＤ、その施設が全国的に有名であるか一部の地方で有名であるかを示す知名度フラグ（例えば、全国的に有名である場合に「１」が、一部の地方で有名である場合に「２」が、それ以外の場合に「０」が設定される）が、施設の詳細情報には含まれている。詳細情報検索部５８は、施設選択部５４によって選択された施設名に対応する施設の詳細情報を地図ＤＢ５６を用いて検索する。 When the user inputs a facility name by voice, the voice input unit 42 performs voice recognition processing on the input voice and extracts a plurality of candidate facility names. The extracted plurality of facility names are input to the display processing unit 70 as a voice recognition result and displayed on the display device 5. The facility selection unit 54 selects one from a plurality of facility names extracted by the voice input unit 42 according to a user instruction. The facility DB 56 stores detailed information on facilities. For example, the category (industry) of the facility, the business contents, etc., the chain ID for identifying the chain store if it is a store of a chain store, the facility is famous nationwide or famous in some regions (For example, “1” is set when famous nationwide, “2” is set when famous in some regions, and “0” is set otherwise) ) Is included in the detailed information of the facility. The detailed information search unit 58 searches for detailed information on the facility corresponding to the facility name selected by the facility selecting unit 54 using the map DB 56.

また、上述した音声入力部４２は、音声認識辞書４４、発音スコア算出部４６、店舗数スコア算出部４８、総合スコア算出部５０、認識結果出力部５２を含んでいる。 Further, the voice input unit 42 described above includes a voice recognition dictionary 44, a pronunciation score calculation unit 46, a store number score calculation unit 48, a total score calculation unit 50, and a recognition result output unit 52.

音声認識辞書４４は、あらかじめ登録された複数の施設（認識対象語彙）のそれぞれについて施設名やこの施設名の読み方（音素モデル）を含んでいる。発音スコア算出部４６は、マイクロホン７によって集音された利用者の発話による施設名から特徴量を抽出し、この特徴量と音声認識辞書４４に格納されている各施設の音素モデルの特徴量とを比較し、各施設ごとの類似度を発音スコアとして算出する。例えば、音声入力された施設名の特徴量と、音声認識辞書４４に格納されたいずれかの施設に対応する音素モデルの特徴量とが完全に一致した場合の発音スコアを１００とすると、各施設について類似度に応じて１００未満の発音スコアが得られる。 The speech recognition dictionary 44 includes a facility name and how to read the facility name (phoneme model) for each of a plurality of facilities (recognition target vocabulary) registered in advance. The pronunciation score calculation unit 46 extracts a feature amount from the facility name by the user's utterance collected by the microphone 7, and the feature amount and the feature amount of the phoneme model of each facility stored in the speech recognition dictionary 44. And the similarity for each facility is calculated as a pronunciation score. For example, assuming that the pronunciation score when the feature quantity of the facility name inputted by speech and the feature quantity of the phoneme model corresponding to any of the facilities stored in the speech recognition dictionary 44 completely match is 100, A pronunciation score of less than 100 is obtained according to the similarity.

店舗数スコア算出部４８は、発音スコア算出部４６によって発音スコアが算出された複数の施設のそれぞれについて、各施設の属性情報としてのチェーン店の規模（店舗数）に対応した店舗数スコアを算出する。例えば、チェーン店の規模を１０段階に分け、最も規模が大きいチェーン店に属する場合を１００、次の規模のチェーン店に属する場合を９０、以下、同様にして、規模が小さくなるにしたがって８０、７０、・・・、１０、チェーン店に属さない場合を０とする場合が考えられる。なお、このような店舗数スコアの設定方法は一例であって、店舗数スコアの分け方を１０段階以外にしたり、店舗数スコアの最大値を１００以外（例えば５０）に設定したり、適宜変更してもよい。また、各施設が属するチェーン店の店舗数は、施設ＤＢ５６に格納されている各施設のチェーンＩＤの中から同じものをカウントすることにより知ることができるが、各施設ごとに属するチェーン店の店舗数を対応させて施設ＤＢ５６に格納しておいてもよい。 The store number score calculation unit 48 calculates a store number score corresponding to the scale (number of stores) of the chain store as attribute information of each facility for each of the plurality of facilities whose pronunciation score is calculated by the pronunciation score calculation unit 46. To do. For example, the scale of a chain store is divided into 10 stages, 100 if the chain belongs to the largest chain store, 90 if it belongs to the chain store of the next scale, and so on. 70,..., 10, and 0 may not be a chain store. Note that such a method for setting the store number score is an example, and the method of dividing the store number score is set to other than 10 stages, the maximum value of the store number score is set to other than 100 (for example, 50), or appropriately changed. May be. Further, the number of stores of chain stores to which each facility belongs can be known by counting the same one from the chain IDs of each facility stored in the facility DB 56, but the stores of chain stores belonging to each facility The numbers may be stored in the facility DB 56 in correspondence with each other.

総合スコア算出部５０は、発音スコア算出部４６によって算出された発音スコアと、店舗数スコア算出部４８によって算出された店舗数スコアとに基づいて、発音スコア算出部４６によって発音スコアが算出された複数の施設のそれぞれについて尤度としての総合スコアを算出する。例えば、発音スコアを第１の値Ａ、店舗数スコアを第２の値Ｂとしたときに、それぞれの値に対して所定の重み付けを行って足し合わせることにより（Ｃ＝ａＡ＋ｂＢ）、総合スコアＣを算出する場合が考えられる。ａ、ｂのそれぞれを０．５（５０％）とすると、Ｃ＝０．５Ａ＋０．５Ｂとなって、単純な相加平均によって総合スコアＣを算出することになる。また、ａ、ｂの値が０．５以外であってもよい。あるいは、このような重み付けを行う代わりに、相乗平均（Ｃ＝√（Ａ×Ｂ））やその他の演算によって総合コストＣを算出するようにしてもよい。 Based on the pronunciation score calculated by the pronunciation score calculation unit 46 and the store number score calculated by the store number score calculation unit 48, the general score calculation unit 50 calculates the pronunciation score by the pronunciation score calculation unit 46. A total score as a likelihood is calculated for each of a plurality of facilities. For example, when the pronunciation score is the first value A and the store number score is the second value B, the total score C is obtained by adding a predetermined weight to each value (C = aA + bB). It is possible to calculate. If each of a and b is 0.5 (50%), C = 0.5A + 0.5B, and the total score C is calculated by a simple arithmetic mean. Further, the values of a and b may be other than 0.5. Alternatively, instead of performing such weighting, the total cost C may be calculated by a geometric mean (C = √ (A × B)) or other operations.

認識結果出力部５２は、総合スコア算出部５０によって算出された総合スコアが高い順に複数個（例えば１０個）の施設名を抽出して認識結果として出力する。例えば、認識結果出力部５２は、複数個の施設名を総合スコアが高いグループと低いグループに分けて２ページ分の認識結果を示す認識結果画像描画データを作成する。 The recognition result output unit 52 extracts a plurality of (for example, 10) facility names in descending order of the total score calculated by the total score calculation unit 50, and outputs the result as a recognition result. For example, the recognition result output unit 52 divides a plurality of facility names into a group with a high overall score and a group with a low overall score, and creates recognition result image drawing data indicating the recognition results for two pages.

入力処理部６０は、操作部３から入力される各種の操作指示に対応する動作を行うための命令をナビゲーションコントローラ１内の各部に向けて出力する。表示処理部７０は、地図描画部１４によって作成される通常地図描画データが入力されており、この描画データに基づいて所定範囲の通常地図を表示装置５の画面に表示する。また、経路探索処理部３０によって作成される通常走行経路などを示す誘導経路描画データが入力されると、表示処理部７０は、この描画データに対応する走行経路や右左折交差点に対応する案内図を地図に重ねて表示装置５の画面に表示する。また、認識結果出力部５２によって作成される認識結果画像描画データが入力されると、表示処理部７０は、複数の施設名を含む認識結果を表示する。この認識結果は、２ページ分が作成され、最初は総合スコアが高いグループが含まれる第１の認識結果リストが表示されるが、利用者によってページ送りが指示されると、総合スコアが低いグループが含まれる第２の認識結果リストに表示が切り替わるようになっている。 The input processing unit 60 outputs commands for performing operations corresponding to various operation instructions input from the operation unit 3 toward each unit in the navigation controller 1. The display processing unit 70 receives normal map drawing data created by the map drawing unit 14 and displays a normal map in a predetermined range on the screen of the display device 5 based on the drawing data. In addition, when guidance route drawing data indicating a normal driving route or the like created by the route search processing unit 30 is input, the display processing unit 70 guides a driving route corresponding to the drawing data or a right / left turn intersection. Is superimposed on the map and displayed on the screen of the display device 5. When the recognition result image drawing data created by the recognition result output unit 52 is input, the display processing unit 70 displays a recognition result including a plurality of facility names. This recognition result is created for two pages, and a first recognition result list including a group with a high overall score is displayed at first. However, when the page is directed by the user, a group with a low overall score is displayed. The display is switched to the second recognition result list including.

上述した発音スコア算出部４６が類似度算出手段に、店舗数スコア算出部４８、総合スコア算出部５０が尤度算出手段に、認識結果出力部５２が認識結果出力手段にそれぞれ対応する。 The pronunciation score calculation unit 46 described above corresponds to the similarity calculation unit, the store number score calculation unit 48, the total score calculation unit 50 corresponds to the likelihood calculation unit, and the recognition result output unit 52 corresponds to the recognition result output unit.

本実施形態のナビゲーション装置１００はこのような構成を有しており、次に、その動作について説明する。図２は、入力音声に対して音声認識処理を行って一の施設を抽出する動作手順を示す流れ図である。 The navigation apparatus 100 of this embodiment has such a configuration, and next, the operation thereof will be described. FIG. 2 is a flowchart showing an operation procedure for performing voice recognition processing on an input voice and extracting one facility.

利用者によって施設検索が指示されると、音声入力部４２による施設名に対する音声認識処理が開始される。音声によって施設名が入力されると（ステップ１００）、発音スコア算出部４６は、この音声入力された施設名について特徴量を抽出し、音素モデルの特徴量が類似する順に複数の施設について発音スコアを算出する（ステップ１０２）。また、店舗数スコア算出部４８は、発音スコア算出部４６によって発音スコア算出の対象となっている複数の施設について店舗数スコアを算出する（ステップ１０４）。 When facility search is instructed by the user, speech recognition processing for the facility name by the speech input unit 42 is started. When the facility name is input by voice (step 100), the pronunciation score calculation unit 46 extracts the feature amount for the facility name input by the speech, and the pronunciation score for a plurality of facilities in the order in which the feature amounts of the phoneme model are similar. Is calculated (step 102). In addition, the store number score calculation unit 48 calculates the store number score for a plurality of facilities whose pronunciation score is to be calculated by the pronunciation score calculation unit 46 (step 104).

次に、総合スコア算出部５０は、発音スコア算出部４６によって算出された発音スコアと、店舗数スコア算出部４８によって算出された店舗数スコアとを用いて総合スコアを算出する（ステップ１０６）。なお、上述した発音スコア、店舗数スコア、総合スコアの算出は、別々に行うのではなく、並行して行われる。 Next, the total score calculation unit 50 calculates a total score using the pronunciation score calculated by the pronunciation score calculation unit 46 and the store number score calculated by the store number score calculation unit 48 (step 106). Note that the above-described calculation of the pronunciation score, the store number score, and the overall score is not performed separately but in parallel.

図３は、発音スコアと店舗数スコアに基づいて算出した総合スコアの具体例を示す図である。図３に示す例では、施設Ｐ１について、発音スコアＡが６０、店舗数スコアＢが１００であり、それぞれの重み付けａ、ｂがともに５０％の場合には、総合スコアＤが８０となる。同様に、施設Ｐ２について、発音スコアＡが７０、店舗数スコアＢが４０であり、総合スコアＤが５５となる。施設Ｐ３について、発音スコアＡが８０、店舗数スコアＢが３０であり、総合スコアＤが５５となる。 FIG. 3 is a diagram illustrating a specific example of the total score calculated based on the pronunciation score and the store number score. In the example shown in FIG. 3, for the facility P1, when the pronunciation score A is 60 and the store number score B is 100, and the weights a and b are both 50%, the total score D is 80. Similarly, for the facility P2, the pronunciation score A is 70, the store number score B is 40, and the total score D is 55. For facility P3, pronunciation score A is 80, store number score B is 30, and total score D is 55.

次に、認識結果出力部５２は、総合スコアが高い順（値が大きい順）に、上位から複数個（例えば１０個）の施設を抽出して認識結果として出力（表示）する（ステップ１０８）。 Next, the recognition result output unit 52 extracts a plurality of (for example, 10) facilities from the top in the order of higher total score (in descending order) and outputs (displays) the results as recognition results (step 108). .

図４は、認識結果の表示例を示す図である。図４に示すように、認識結果として、総合スコアが高い順に１０個の施設が抽出され、総合スコアが高い上位５つの施設Ｐ１、Ｐ２、Ｐ３、Ｐ４、Ｐ５のそれぞれの施設名ｐ１、ｐ２、ｐ３、ｐ４、ｐ５が含まれる１ページ目の認識結果画像Ｄ１と、総合スコアが低い下位５つの施設Ｐ６、Ｐ７、Ｐ８、Ｐ９、Ｐ１０のそれぞれの施設名ｐ６、ｐ７、ｐ８、ｐ９、ｐ１０が含まれる２ページ目の認識結果画像Ｄ２が認識結果出力部５２によって作成され、最初に１ページ目の認識結果画像Ｄ１が表示装置５に表示される。この状態で、利用者によってページ送りが指示されると（操作部３を用いて指示する場合のほか、音声入力によって指示する場合などが考えられる）、２ページ目の認識結果画像Ｄ２が表示装置５に表示される。なお、認識結果画像Ｄ１、Ｄ２のそれぞれでは、総合スコアが高い施設がより上位に、総合スコアが低くなるにつれて下位になるように配置されている。 FIG. 4 is a diagram illustrating a display example of the recognition result. As shown in FIG. 4, as a recognition result, ten facilities are extracted in descending order of the overall score, and the facility names p1, p2, and p5 of the top five facilities P1, P2, P3, P4, and P5 having the highest total score are shown. The recognition result image D1 of the first page including p3, p4, and p5, and the facility names p6, p7, p8, p9, and p10 of the lower five facilities P6, P7, P8, P9, and P10 having a low overall score are The recognition result image D2 of the second page included is created by the recognition result output unit 52, and the recognition result image D1 of the first page is first displayed on the display device 5. In this state, when the page is instructed by the user (in addition to instructing by using the operation unit 3 or instructing by voice input), the recognition result image D2 of the second page is displayed on the display device. 5 is displayed. In each of the recognition result images D1 and D2, the facilities having a high total score are arranged higher and the facilities are arranged lower as the total score becomes lower.

このようにして、認識結果画像Ｄ１、Ｄ２が表示された状態において、施設選択部５４は、利用者によっていずれかの施設（施設名）が選択されたか否かを判定する（ステップ１１０）。選択操作が行われない場合には否定判断が行われ、この判定が繰り返される。また、利用者によっていずれかの施設を指定する選択操作が行われると、ステップ１１０の判定において肯定判断が行われる。次に、施設選択部５４は、利用者の指示に応じて一の施設を選択する（ステップ１１２）。このようにして、音声入力を用いた施設抽出に関する一連の動作が終了する。抽出した施設は、詳細情報検索部５８による詳細情報検索の対象施設として用いたり、目的地設定部３４によって設定する目的地として用いることができる。 In this way, in a state where the recognition result images D1 and D2 are displayed, the facility selection unit 54 determines whether any facility (facility name) has been selected by the user (step 110). If the selection operation is not performed, a negative determination is made and this determination is repeated. When the user performs a selection operation for designating any facility, an affirmative determination is made in the determination in step 110. Next, the facility selection unit 54 selects one facility according to the user's instruction (step 112). In this way, a series of operations related to facility extraction using voice input is completed. The extracted facility can be used as a target facility for detailed information search by the detailed information search unit 58 or can be used as a destination set by the destination setting unit 34.

このように、本実施形態のナビゲーション装置１００の音声入力部４２では、発声した施設名と複数の認識対象語彙との類似度だけでなく、各認識対象語彙に対応する属性情報（各施設が属するチェーン店の店舗数）を組み合わせることにより、複数の認識対象語彙の間に、音声による認識結果とは別に、選択される度合いに差をつけることができるため、利用者が発声した音声のみに基づいて認識結果を得る場合に比べて認識率を向上させることができる。 As described above, in the voice input unit 42 of the navigation device 100 according to the present embodiment, not only the similarity between the name of the facility spoken and a plurality of recognition target words, but also attribute information (each facility belongs) corresponding to each recognition target word. By combining the number of chain stores), it is possible to make a difference in the degree of selection between a plurality of recognition target vocabularies separately from the speech recognition result, and therefore based only on the speech uttered by the user Thus, the recognition rate can be improved as compared with the case where the recognition result is obtained.

また、総合スコア算出部５０によって算出された総合スコア（尤度）が高い順に所定数の施設名を、最も総合スコアが高い施設名が最も上位に配置されるように、音声認識結果として表示している。これにより、最も上位に配置された施設名が認識結果として正しい可能性が高くなるため、正しい施設名が最も目につきやすくなるとともに、この施設名を選択する場合の操作が容易となる。 In addition, a predetermined number of facility names are displayed in order from the highest total score (likelihood) calculated by the total score calculation unit 50 as the voice recognition result so that the facility name with the highest total score is placed at the top. ing. Thereby, since the possibility that the name of the facility arranged at the top is correct as the recognition result is high, the correct facility name is most easily noticeable, and the operation for selecting this facility name is facilitated.

また、認識結果を複数ページに分けて、総合スコアが高い施設名が最前のページに含まれるように表示しているため、最も総合スコアが高い施設名が２ページ目以降に配置されることがなくなり、最も総合スコアが高い施設名を表示させた後に選択するためのページ送り等の煩雑な操作が不要となる。 In addition, since the recognition result is divided into a plurality of pages and the facility name having the highest overall score is displayed in the first page, the facility name having the highest overall score may be arranged on the second and subsequent pages. This eliminates the need for complicated operations such as page turning for selection after the facility name with the highest overall score is displayed.

また、総合スコアを算出する際に、発音スコアと店舗数スコアのそれぞれに対して所定の重み付けを行って足し合わせているため、総合スコアにおけるチェーン店の店舗数による寄与度を調整して、音声認識結果として特定の施設が優先的に得られるようにすることができる。 In addition, when calculating the total score, each of the pronunciation score and the store number score is weighted and added together, so the contribution by the number of stores of the chain store in the total score is adjusted and the voice A specific facility can be preferentially obtained as a recognition result.

なお、本発明は上記実施形態に限定されるものではなく、本発明の要旨の範囲内において種々の変形実施が可能である。例えば、上述した実施形態では、各施設の属性情報として、各施設の知名度に相関のあるチェーン店の店舗数を用いて店舗数スコアを算出するようにしたが、各施設の知名度に相関のある属性情報であればその他の属性情報を用いるようにしてもよい。例えば、各施設に対応する知名度フラグを用いて知名度スコアを算出し、店舗数スコアの代わりに、あるいは、店舗数スコアとともに用いるようにしてもよい。あるいは、各施設の利用履歴を記録しておいて、利用回数が多い施設について値が大きい利用スコアを設定し、店舗数スコアや知名度スコアに代えて、あるいはこれらとともに用いるようにしてもよい。 In addition, this invention is not limited to the said embodiment, A various deformation | transformation implementation is possible within the range of the summary of this invention. For example, in the above-described embodiment, the store number score is calculated using the number of stores of the chain store that has a correlation with the name recognition of each facility as the attribute information of each facility. However, there is a correlation with the name recognition of each facility. Other attribute information may be used if it is attribute information. For example, a well-known score may be calculated using a well-known flag corresponding to each facility and used instead of the store number score or together with the store number score. Alternatively, the use history of each facility may be recorded, a use score having a large value may be set for a facility having a large number of uses, and used instead of or together with the store number score or the name recognition score.

図５は、発音スコアと知名度スコアに基づいて算出した総合スコアの具体例を示す図である。図５に示す例では、施設Ｐ１について、発音スコアＡが６０、知名度スコアＣが１００（知名度フラグが「１」（全国的に有名）の場合）であり、それぞれの重み付けａ、ｃがともに５０％の場合には、総合スコアＤが８０となる。同様に、施設Ｐ２について、発音スコアＡが７０、知名度スコアＣが５０（知名度スコアが「２」（一部の地方で有名）の場合）であり、総合スコアＤが６０となる。施設Ｐ３について、発音スコアＡが８０、知名度スコアＣが０（知名度スコアが「０」の場合）であり、総合スコアＤが４０となる。 FIG. 5 is a diagram illustrating a specific example of the total score calculated based on the pronunciation score and the familiarity score. In the example shown in FIG. 5, for the facility P1, the pronunciation score A is 60 and the name recognition score C is 100 (when the name recognition flag is “1” (nationally famous)), and each of the weights a and c is 50. In the case of%, the total score D is 80. Similarly, for the facility P2, the pronunciation score A is 70, the name recognition score C is 50 (when the name recognition score is “2” (famous in some regions), and the total score D is 60. For the facility P3, the pronunciation score A is 80, the name recognition score C is 0 (when the name recognition score is “0”), and the total score D is 40.

図６は、発音スコアと店舗数スコア、知名度スコアに基づいて算出した総合スコアの具体例を示す図である。図６に示す例では、施設Ｐ１について、発音スコアＡが６０、店舗数スコアＢが１００、知名度スコアＣが１００であり、それぞれの重み付けａ、ｂ、ｃが５０％、３０％、２０％の場合には、総合スコアＤが８０となる。同様に、施設Ｐ２について、発音スコアＡが７０、店舗数スコアＢが４０、知名度スコアＣが５０であり、総合スコアＤが５７となる。施設Ｐ３について、発音スコアＡが８０、店舗数スコアＢが３０、知名度スコアＣが０であり、総合スコアＤが４９となる。 FIG. 6 is a diagram illustrating a specific example of the total score calculated based on the pronunciation score, the store number score, and the name recognition score. In the example shown in FIG. 6, for the facility P1, the pronunciation score A is 60, the number-of-stores score B is 100, and the visibility score C is 100, and the weights a, b, and c are 50%, 30%, and 20%, respectively. In this case, the total score D is 80. Similarly, for the facility P2, the pronunciation score A is 70, the store number score B is 40, the name recognition score C is 50, and the total score D is 57. For the facility P3, the pronunciation score A is 80, the number-of-stores score B is 30, the visibility score C is 0, and the total score D is 49.

このように、利用者が発声した音声に類似する複数の施設が存在する場合に、店舗数スコアに代えて、あるいは、店舗数スコアとともに知名度スコアを用いることにより、知名度に基いて総合スコアを高くすることができるため、知名度がある施設名を認識結果として優先的に得ることにより、認識率を向上させることができる。 In this way, when there are a plurality of facilities similar to the voice uttered by the user, the total score is increased based on the name recognition by using the name recognition score instead of the store number score or together with the number of stores score. Therefore, the recognition rate can be improved by preferentially obtaining the name of a facility with a known degree as a recognition result.

また、上述した実施形態では、施設検索部４０、音声入力部４２がナビゲーション装置１００の一部に備わっている場合について説明したが、施設検索部４０や音声入力部４２はナビゲーション装置１００とは別に設けられ、認識結果としての施設名や検索結果としての施設の詳細情報がナビゲーション装置１００に入力されるようにしてもよい。また、施設検索部４０や音声入力部４２をナビゲーション装置１００以外の装置と組み合わせたり、単独で用いるようにしてもよい。 In the above-described embodiment, the case where the facility search unit 40 and the voice input unit 42 are provided in a part of the navigation device 100 has been described. However, the facility search unit 40 and the voice input unit 42 are separate from the navigation device 100. The facility name as a recognition result and the detailed information of the facility as a search result may be input to the navigation device 100. Further, the facility search unit 40 and the voice input unit 42 may be combined with devices other than the navigation device 100 or may be used alone.

上述したように、本発明によれば、発声した施設名と複数の認識対象語彙との類似度だけでなく、各認識対象語彙に対応する属性情報を組み合わせることにより、複数の認識対象語彙の間に、音声による認識結果とは別に、選択される度合いに差をつけることができるため、利用者が発声した音声のみに基づいて認識結果を得る場合に比べて認識率を向上させることができる。 As described above, according to the present invention, not only the degree of similarity between the name of a facility spoken and a plurality of recognition target vocabularies, but also a combination of attribute information corresponding to each recognition target vocabulary, In addition, since the degree of selection can be made different from the recognition result by voice, the recognition rate can be improved as compared with the case of obtaining the recognition result based only on the voice uttered by the user.

１ナビゲーションコントローラ
１０地図バッファ
４０施設検索部
４２音声入力部
４４音声認識辞書
４６発音スコア算出部
４８店舗数スコア算出部
５０総合スコア算出部
５２認識結果出力部
５４施設選択部
５６施設ＤＢ（データベース）
５８詳細情報検索部
６０入力処理部
７０表示処理部
１００ナビゲーション装置 DESCRIPTION OF SYMBOLS 1 Navigation controller 10 Map buffer 40 Facility search part 42 Voice input part 44 Speech recognition dictionary 46 Pronunciation score calculation part 48 Store number score calculation part 50 Comprehensive score calculation part 52 Recognition result output part 54 Facility selection part 56 Facility DB (database)
58 Detailed Information Search Unit 60 Input Processing Unit 70 Display Processing Unit 100 Navigation Device

Claims

A similarity calculation means for performing voice recognition processing on the facility name uttered by the user and calculating a similarity corresponding to each of a plurality of recognition target vocabularies;
A likelihood calculating means for calculating a likelihood corresponding to each of the plurality of recognition target words based on the attribute information corresponding to each of the plurality of recognition target words and the similarity;
A recognition result output means for outputting a speech recognition result corresponding to the facility name uttered by the user based on the likelihood calculated by the likelihood calculating means;
A facility retrieval apparatus comprising:

The recognition result output unit arranges the facility names indicated by a predetermined number of the recognition target words in descending order of the likelihood calculated by the likelihood calculating unit so that the facility name having the highest likelihood is arranged at the top. The facility search device according to claim 1, wherein the facility search device is displayed as a voice recognition result.

The recognition result output means divides a facility name indicated by a predetermined number of the vocabulary to be recognized into a plurality of pages, and displays a speech recognition result so that the most likely facility name is included in the front page. The facility search device according to claim 1 or 2.

The likelihood calculating means calculates the likelihood by adding a predetermined weight to each of the first value corresponding to the similarity and the second value corresponding to the attribute information. The facility search device according to any one of claims 1 to 3, wherein:

The facility search apparatus according to any one of claims 1 to 4, wherein the attribute information is information correlated with a degree of familiarity of a facility name corresponding to each of the plurality of recognition target vocabularies.

The attribute information is information indicating the number of stores belonging to a chain store when a facility corresponding to each of the plurality of recognition target vocabularies is a store of the chain store. The facility search device according to any one of the above.

The attribute information is famous in any of the information indicating that the facility corresponding to each of the plurality of recognition target vocabularies is famous nationwide and / or in a plurality of regions divided into a plurality of nations. It is the information which shows that, The facility search apparatus as described in any one of Claims 1-6 characterized by the above-mentioned.