JP2016031395A

JP2016031395A - Reference display device and program

Info

Publication number: JP2016031395A
Application number: JP2014152480A
Authority: JP
Inventors: 紀行畑; Noriyuki Hata
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2014-07-28
Filing date: 2014-07-28
Publication date: 2016-03-07
Also published as: TW201610980A; WO2016017623A1

Abstract

PROBLEM TO BE SOLVED: To provide a reference display device and program for performing intuitive and easy-to-understand display.SOLUTION: A CPU 11 displays a linear image from a subjective viewpoint whose depth direction corresponds to time and plane direction (vertical direction in this example) corresponds to musical scale. In a screen, a linear image corresponding to the reference data is displayed. A character image corresponding to each linear image is displayed. A score corresponding to a grade result is displayed in an upper part of the screen. The CPU 11 scrolls the linear image and background so that the character image moves in the depth direction along the linear image (namely, along the musical interval of a singer).SELECTED DRAWING: Figure 4

Description

本発明は、表示装置に関し、特にお手本（リファレンス）の表示を行う装置に関する。 The present invention relates to a display device, and more particularly to a device for displaying a model (reference).

従来、カラオケ装置は、歌詞やお手本の音程を表示部に表示することが行われている（例えば特許文献１を参照）。音程は、いわゆるピアノロールとして表示される。ピアノロールとは、縦軸が音階（ピアノの鍵盤が縦になった状態）、横軸が時間に対応した画面上に、各音の発音開始タイミングと発音の長さに応じた線状の画像を表示するものである。これにより、歌唱者は、歌唱するタイミングと音程を視覚的に把握することができる。 2. Description of the Related Art Conventionally, karaoke devices display lyrics and model pitches on a display unit (see, for example, Patent Document 1). The pitch is displayed as a so-called piano roll. A piano roll is a linear image on the screen where the vertical axis represents the scale (the piano keyboard is vertical) and the horizontal axis corresponds to the time. Is displayed. Thereby, the singer can grasp | ascertain visually the timing and pitch which sing.

特開２００４−２０５８１７号公報JP 2004-205817 A

しかし、従来のピアノロールは必ずしも直感的で分かりやすいものではなかった。例えば、複数人でデュエット歌唱を行う場合、各歌唱者のピアノロールを同一画面上に表示しようとしても、音程が重なった場合に線状画像が重なり、見難いものになってしまう。 However, conventional piano rolls are not always intuitive and easy to understand. For example, when performing a duet singing by a plurality of people, even if an attempt is made to display each singer's piano roll on the same screen, the linear images overlap when the pitches overlap, making it difficult to see.

そこで、本発明は、より直感的で分かりやすい表示を行うリファレンス表示装置およびプログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a reference display device and a program that perform a more intuitive and easy-to-understand display.

本発明のリファレンス表示装置は、表示部と、リファレンスデータに基づいて、発音タイミング、音程、および発音長を示す線状画像を前記表示部に表示する画像処理部と、を備えている。 The reference display device of the present invention includes a display unit, and an image processing unit that displays a linear image indicating a sound generation timing, a pitch, and a sound generation length on the display unit based on the reference data.

そして、画像処理部は、奥行き方向を時間軸に対応させ、平面方向を音程に対応させた主観視点で前記線状画像を表示することを特徴とする。主観視点とは、ユーザ自身の視野を模した表示態様である。ただし、ユーザ自身に相当する画像（キャラクタ画像等）を表示し、当該キャラクタ画像等を背後から映すように表示する態様も主観視点に相当する。 The image processing unit displays the linear image from a subjective viewpoint in which the depth direction corresponds to the time axis and the planar direction corresponds to the pitch. The subjective viewpoint is a display mode that imitates the user's own visual field. However, a mode in which an image (character image or the like) corresponding to the user himself / herself is displayed and the character image or the like is displayed from behind is also equivalent to a subjective viewpoint.

これにより、ユーザ（歌唱者、話者、または演奏者等）の主観的な空間上にお手本となる線状画像が表示されることになる。ユーザは、各音の発音タイミングと音程を、より直感的に把握することができ、かつ楽しみながら歌唱等を行うことができる。 As a result, a linear image serving as a model is displayed in the subjective space of the user (singer, speaker, player, etc.). The user can more intuitively understand the sound generation timing and pitch of each sound, and can sing while having fun.

なお、画像処理部は、現在の発音タイミングに応じた位置にユーザに対応する画像（ユーザを撮影した写真、キャラクタ画像等）を表示し、該ユーザに対応する画像が前記線状画像に沿って移動するように、前記線状画像をスクロールさせる態様としてもよい。この場合、ユーザは、自身の発音に応じてキャラクタ等を移動させているように感じることができ、歌唱、語学学習、または演奏等を楽しんで行うことができる。 The image processing unit displays an image corresponding to the user (a photograph taken by the user, a character image, etc.) at a position corresponding to the current pronunciation timing, and the image corresponding to the user is along the linear image. The linear image may be scrolled so as to move. In this case, the user can feel as if the character or the like is moved in accordance with his / her pronunciation, and can enjoy singing, language learning, or playing.

また、リファレンス表示装置は、例えばデュエット歌唱を行う場合に、各歌唱者に対するリファレンスデータを、それぞれ複数の線状画像として、前記主観視点で並べて表示する。本発明は、主観視点で各線状画像を並べて表示するため、同一画面上であっても各線状画像が重なることがなく、高い視認性を得ることができる。 Further, for example, when performing a duet singing, the reference display device displays reference data for each singer as a plurality of linear images side by side from the subjective viewpoint. Since the present invention displays the linear images side by side from the subjective viewpoint, the linear images do not overlap even on the same screen, and high visibility can be obtained.

また、自身に相当するキャラクタと他の歌唱者に相当するキャラクタとを並行して表示することも可能である。この場合、ユーザは、自身の主観的な空間内に他の歌唱者が表示されるため、他の歌唱者と一緒に歌唱を行っている雰囲気をより感じ取ることができる。 It is also possible to display a character corresponding to itself and a character corresponding to another singer in parallel. In this case, since the other singers are displayed in the subjective space of the user, the user can more feel the atmosphere of singing with the other singers.

また、リファレンス表示装置は、ユーザの発音結果を入力する入力手段と、前記ユーザの発音結果を前記リファレンスデータと比較することにより採点を行う採点手段と、を備え、前記画像処理部は、前記採点手段の採点結果を前記線状画像とともに表示することが好ましい。 The reference display device includes input means for inputting a user's pronunciation result, and scoring means for scoring by comparing the user's pronunciation result with the reference data, and the image processing unit includes the scoring It is preferable to display the scoring results of the means together with the linear image.

この場合、ユーザは、自身の歌唱等の結果に応じて表示画像が変化するため、より楽しみながら歌唱等を行うことができる。 In this case, since the display image changes according to the result of the user's own singing or the like, the user can perform singing or the like while having more fun.

本発明のリファレンス表示装置は、より直感的で分かりやすい表示を行うことができる。 The reference display device of the present invention can perform a more intuitive and easy-to-understand display.

カラオケシステムの構成を示したブロック図である。It is the block diagram which showed the structure of the karaoke system. カラオケ装置の構成を示したブロック図である。It is the block diagram which showed the structure of the karaoke apparatus. センタの構成を示したブロック図である。It is the block diagram which showed the structure of the center. リファレンスデータを含む各種データの構造を示す図である。It is a figure which shows the structure of the various data containing reference data. リファレンスの表示例を示す図である。It is a figure which shows the example of a display of a reference. リファレンスの表示例を示す図である。It is a figure which shows the example of a display of a reference. カラオケ装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a karaoke apparatus. 他の例に係るリファレンスの表示態様である。It is a display mode of a reference concerning another example. 他の例に係るリファレンスの表示態様である。It is a display mode of a reference concerning another example. リファレンス表示装置の最小構成を示したブロック図である。It is the block diagram which showed the minimum structure of the reference display apparatus.

図１は、本発明のリファレンス表示装置を備えたカラオケシステムの構成を示す図である。カラオケシステムは、インターネット等のネットワーク２を介して接続されるセンタ（サーバ）１と、複数のカラオケ店舗３と、からなる。 FIG. 1 is a diagram showing a configuration of a karaoke system provided with a reference display device of the present invention. The karaoke system includes a center (server) 1 connected via a network 2 such as the Internet and a plurality of karaoke stores 3.

各カラオケ店舗３には、ネットワーク２に接続されるルータ等の中継機５と、中継機５を介してネットワーク２に接続される複数のカラオケ装置７が設けられている。中継機５は、カラオケ店舗の管理室内等に設置されている。複数台のカラオケ装置７は、それぞれ個室（カラオケボックス）に１台ずつ設置されている。また、各カラオケ装置７には、それぞれリモコン９が設置されている。 Each karaoke store 3 is provided with a relay device 5 such as a router connected to the network 2 and a plurality of karaoke devices 7 connected to the network 2 via the relay device 5. The repeater 5 is installed in a management room of a karaoke store. A plurality of karaoke apparatuses 7 are installed in each private room (karaoke box). Each karaoke device 7 is provided with a remote controller 9.

カラオケ装置７は、中継機５およびネットワーク２を介して他のカラオケ装置７と通信可能になっている。カラオケシステムは、異なる場所に設置されているカラオケ装置７同士で通信を行い、複数の歌唱者間でデュエットを行うことができる。 The karaoke device 7 can communicate with other karaoke devices 7 via the relay 5 and the network 2. A karaoke system can communicate between karaoke apparatuses 7 installed in different places, and can perform a duet among a plurality of singers.

図２は、カラオケ装置の構成を示すブロック図である。図３は、センタ１の構成を示すブロック図である。 FIG. 2 is a block diagram showing the configuration of the karaoke apparatus. FIG. 3 is a block diagram showing the configuration of the center 1.

カラオケ装置７は、本発明のリファレンス表示装置に相当する。カラオケ装置７は、ＣＰＵ１１、ＲＡＭ１２、ＨＤＤ１３、ネットワークインタフェース（Ｉ／Ｆ）１４、ＬＣＤ（タッチパネル）１５、マイク１６、Ａ／Ｄコンバータ１７、音源１８、ミキサ（エフェクタ）１９、サウンドシステム（ＳＳ）２０、スピーカ２１、ＭＰＥＧ等のデコーダ２２、表示処理部２３、モニタ２４、操作部２５、および送受信部２６を備えている。 The karaoke device 7 corresponds to a reference display device of the present invention. The karaoke device 7 includes a CPU 11, a RAM 12, an HDD 13, a network interface (I / F) 14, an LCD (touch panel) 15, a microphone 16, an A / D converter 17, a sound source 18, a mixer (effector) 19, and a sound system (SS) 20. , A speaker 21, an MPEG decoder 22, a display processing unit 23, a monitor 24, an operation unit 25, and a transmission / reception unit 26.

装置全体の動作を制御するＣＰＵ１１には、ＲＡＭ１２、ＨＤＤ１３、ネットワークインタフェース（Ｉ／Ｆ）１４、ＬＣＤ（タッチパネル）１５、Ａ／Ｄコンバータ１７、音源１８、ミキサ（エフェクタ）１９、ＭＰＥＧ等のデコーダ２２、表示処理部２３、操作部２５、および送受信部２６が接続されている。 The CPU 11 that controls the operation of the entire apparatus includes a RAM 12, an HDD 13, a network interface (I / F) 14, an LCD (touch panel) 15, an A / D converter 17, a sound source 18, a mixer (effector) 19, and a decoder 22 such as an MPEG. The display processing unit 23, the operation unit 25, and the transmission / reception unit 26 are connected.

ＨＤＤ１３は、ＣＰＵ１１の動作用プログラムが記憶されている。ワークメモリであるＲＡＭ１２には、ＣＰＵ１１の動作用プログラムを実行するために読み出すエリア、カラオケ曲を演奏するために楽曲データを読み出すエリア、ガイドメロディ等のリファレンスデータを読み出すエリア、予約リストや採点結果等のデータを一時記憶するエリア、等が設定される。 The HDD 13 stores an operation program for the CPU 11. In the RAM 12 as a work memory, an area to be read for executing the operation program of the CPU 11, an area to read music data to play karaoke music, an area to read reference data such as a guide melody, a reservation list, a scoring result, etc. An area for temporarily storing the data is set.

また、ＨＤＤ１３は、カラオケ曲を演奏するための楽曲データを記憶している。さらに、ＨＤＤ１３は、モニタ２４に背景映像を表示するための映像データも記憶している。映像データは、動画および静止画の両方を記憶している。映像データには、後述するキャラクタ画像も含まれている。楽曲データおよび映像データは、定期的にセンタ１から配信され、更新される。 The HDD 13 stores music data for playing karaoke music. Further, the HDD 13 also stores video data for displaying a background video on the monitor 24. The video data stores both moving images and still images. The video data includes a character image to be described later. The music data and the video data are regularly distributed from the center 1 and updated.

センタ１は、図３に示すように、ＣＰＵ３１、ＲＡＭ３２、ＨＤＤ３３、およびネットワークインタフェース（Ｉ／Ｆ）３４を備えている。ＨＤＤ３３は、ＣＰＵ３１の動作用プログラムが記憶されている。ワークメモリであるＲＡＭ３２には、ＣＰＵ３１の動作用プログラムを実行するために読み出すエリア等が設定される。ＣＰＵ３１は、センタ１を統括的に制御する制御部である。ＣＰＵ３１は、ネットワークを介してデュエットを行う場合に、各カラオケ装置７を接続する接続処理を行う。また、ＨＤＤ３３に記憶されている歌唱データの配信処理を行う。 As shown in FIG. 3, the center 1 includes a CPU 31, a RAM 32, an HDD 33, and a network interface (I / F) 34. The HDD 33 stores an operation program for the CPU 31. In the RAM 32 which is a work memory, an area to be read for executing the operation program of the CPU 31 is set. The CPU 31 is a control unit that comprehensively controls the center 1. CPU31 performs the connection process which connects each karaoke apparatus 7, when performing a duet via a network. Moreover, the distribution process of the song data memorize | stored in HDD33 is performed.

ＨＤＤ３３には、配信用の楽曲データおよび映像データが記憶されている。さらに、ＨＤＤ３３には、氏名等の各歌唱者の情報が記憶されている。例えば、各歌唱者がカラオケ装置を操作して特定の歌唱者の氏名を入力すると、当該歌唱者の情報がセンタ１に送信される。ＣＰＵ３１は、ＨＤＤ３３に記憶されている歌唱者情報から当該歌唱者に係る氏名を検索する。ＣＰＵ３１は、歌唱者の氏名を入力したカラオケ装置と、検索された歌唱者が利用するカラオケ装置と、を接続させる。これにより、ネットワーク経由のデュエット歌唱を実現する。 The HDD 33 stores music data and video data for distribution. Further, the HDD 33 stores information on each singer such as a name. For example, when each singer operates a karaoke apparatus and inputs the name of a specific singer, the information on the singer is transmitted to the center 1. CPU31 searches the name which concerns on the said singer from the singer information memorize | stored in HDD33. CPU31 connects the karaoke apparatus which input the name of the singer, and the karaoke apparatus which the searched singer uses. Thereby, duet singing via a network is realized.

ＨＤＤ３３には、さらに歌唱データおよび採点データも記憶されている。歌唱データは、例えば各歌唱者が過去に歌唱したときの歌唱音を録音したものである。採点データは、当該歌唱時の採点結果を示すものである。これら歌唱データおよび採点データは、各歌唱者がリクエストを行った場合に配信される。これにより、各歌唱者は、例えば自身が過去に歌唱したときの歌唱音とともにデュエットを行うことが可能になる。また、歌唱データおよび採点データは、例えば歌手または声優等が歌唱したものであってもよい。これにより、各歌唱者は、好きな歌手または声優等とデュエットを行うことが可能になる。 The HDD 33 further stores singing data and scoring data. The singing data is, for example, recorded singing sound when each singer has sung in the past. The scoring data indicates the scoring results at the time of singing. These singing data and scoring data are distributed when each singer makes a request. Thereby, each singer can perform a duet with the singing sound when he sang in the past, for example. The singing data and the scoring data may be sung by a singer or a voice actor, for example. Thereby, each singer can perform a duet with a favorite singer or a voice actor.

図２に戻り、カラオケ装置７について説明する。ＣＰＵ１１は、カラオケ装置を統括的に制御する制御部であり、機能的にシーケンサを内蔵し、カラオケ演奏を行う。また、ＣＰＵ１１は、音声信号生成処理、映像信号生成処理、採点処理、および線状画像表示処理を行う。これにより、ＣＰＵ１１は、本発明における画像処理部として機能する。 Returning to FIG. 2, the karaoke apparatus 7 will be described. The CPU 11 is a control unit that comprehensively controls the karaoke apparatus, and functionally incorporates a sequencer to perform karaoke performance. The CPU 11 performs audio signal generation processing, video signal generation processing, scoring processing, and linear image display processing. Thereby, the CPU 11 functions as an image processing unit in the present invention.

タッチパネル１５および操作部２５は、カラオケ装置の前面に設けられている。ＣＰＵ１１は、タッチパネル１５から入力される操作情報に基づいて、操作情報に応じた画像をタッチパネル１５上に表示し、ＧＵＩを実現する。また、リモコン９も同様のＧＵＩを実現するものである。ＣＰＵ１１は、タッチパネル１５、操作部２５、または送受信部２６を介してリモコン９から入力される操作情報に基づいて、各種の動作を行う。 The touch panel 15 and the operation unit 25 are provided on the front surface of the karaoke apparatus. The CPU 11 displays an image corresponding to the operation information on the touch panel 15 based on the operation information input from the touch panel 15 to realize a GUI. The remote controller 9 also realizes the same GUI. The CPU 11 performs various operations based on operation information input from the remote controller 9 via the touch panel 15, the operation unit 25, or the transmission / reception unit 26.

次に、カラオケ演奏を行うための構成について説明する。上述したように、ＣＰＵ１１は、機能的にシーケンサを内蔵している。ＣＰＵ１１は、ＲＡＭ１２の予約リストに登録された予約曲の曲番号に対応する楽曲データをＨＤＤ１３から読み出し、シーケンサでカラオケ演奏を行う。 Next, a configuration for performing karaoke performance will be described. As described above, the CPU 11 functionally includes a sequencer. The CPU 11 reads music data corresponding to the music number of the reserved music registered in the reserved list in the RAM 12 from the HDD 13, and performs a karaoke performance with the sequencer.

楽曲データは、例えば図４に示すように、曲番号等が書き込まれているヘッダ、演奏用ＭＩＤＩデータが書き込まれている楽音トラック、ガイドメロディ用ＭＩＤＩデータが書き込まれているガイドメロディトラック、歌詞用ＭＩＤＩデータが書き込まれている歌詞トラック、バックコーラス再生タイミングおよび再生すべき音声データが書き込まれているコーラストラック、息継ぎのタイミングを示すブレス位置トラック、歌唱技法のタイミングを示す技法位置トラック、等からなっている。ガイドメロディトラック、ブレス位置トラック、および技法位置トラックは、本発明のリファレンスデータに対応する。リファレンスデータとは、歌唱者が歌唱の参考にするためのお手本データであり、各音を発する発音タイミング、音程、および発音長を示す情報が含まれている。なお、楽曲データの形式としては、この例に限るものではない。例えば、ブレス位置トラックおよび技法位置トラックのない既存の楽曲データを用意し、ブレス位置トラックおよび技法位置トラックは、別のデータとして用意してもよい。この場合、ブレス位置トラックおよび技法位置トラックが含まれた新たな楽曲データを用意する必要はない。ただし、ブレス位置トラックおよび技法位置トラックのデータには、それぞれ曲番号等の曲識別情報を記載しておく。ＣＰＵ１１は、楽曲データを読み出すときに、対応するブレス位置トラックおよび技法位置トラックを読み出し、シーケンス動作を行う。また、リファレンスデータの形式も、上述のようなＭＩＤＩ形式に限るものではない。例えばブレス位置を示すリファレンスデータとしては、ブレス位置のタイミング（楽曲先頭からの時間経過）を示したテキストデータ等であってもよい。また、リファレンスデータが音声データ（例えば歌唱音を録音したもの）である場合には、当該音声データからピッチを抽出して音程を抽出するとともに、該音程が抽出されるタイミングおよび長さから、発音タイミングおよび発音長を抽出することも可能である。また、音量（パワー）を検出することで無音区間を検出し、各音の間に無音区間が存在する場合には、当該無音区間が抽出されたタイミングをブレス位置のタイミングとして抽出することも可能である。また、所定期間内においてピッチが規則的に変動している場合には、当該機関について「ビブラート」が行われていると判定することで、歌唱技法が行われたタイミング（技法位置）を抽出することも可能である。 For example, as shown in FIG. 4, the music data includes a header in which a music number is written, a musical sound track in which performance MIDI data is written, a guide melody track in which MIDI data for guide melody is written, and lyrics It consists of a lyrics track in which MIDI data is written, a chorus track in which back chorus playback timing and audio data to be played are written, a breath position track indicating the timing of breathing, a technique position track indicating the timing of the singing technique, and the like. ing. The guide melody track, breath position track, and technique position track correspond to the reference data of the present invention. Reference data is model data for a singer to use as a reference for singing, and includes information indicating the sounding timing, pitch, and sounding length of each sound. Note that the format of the music data is not limited to this example. For example, existing music data without a breath position track and a technique position track may be prepared, and the breath position track and the technique position track may be prepared as separate data. In this case, it is not necessary to prepare new music data including the breath position track and the technique position track. However, song identification information such as a song number is described in the data of the breath position track and the technique position track. When reading the music data, the CPU 11 reads the corresponding breath position track and technique position track, and performs a sequence operation. Also, the format of the reference data is not limited to the MIDI format as described above. For example, the reference data indicating the breath position may be text data indicating the timing of the breath position (elapsed time from the beginning of the music). In addition, when the reference data is voice data (for example, recorded singing sound), the pitch is extracted from the voice data to extract the pitch, and the sound is generated from the timing and length at which the pitch is extracted. It is also possible to extract timing and pronunciation length. It is also possible to detect the silent section by detecting the volume (power), and when there is a silent section between each sound, the timing at which the silent section is extracted can be extracted as the timing of the breath position It is. In addition, when the pitch fluctuates regularly within a predetermined period, the timing (technical position) at which the singing technique is performed is extracted by determining that “vibrato” is being performed for the institution. It is also possible.

楽音トラックは、楽音を発生させる楽器の種類、タイミング、音程（キー）、強さ、長さ、定位（パン）、音響効果（エフェクト）等を示す情報が記録されている。ガイドメロディトラックは、お手本の歌唱に対応する各音の発音開始タイミング、発音の長さ等の情報が記録されている。 The musical sound track stores information indicating the type, timing, pitch (key), strength, length, localization (pan), sound effect (effect), etc. of the musical instrument that generates the musical sound. In the guide melody track, information such as the sounding start timing of each sound corresponding to the model song and the length of the sounding are recorded.

シーケンサは、楽音トラックのデータに基づいて音源１８を制御し、カラオケ曲の楽音を発生する。 The sequencer controls the sound source 18 based on the data of the musical sound track and generates the musical sound of the karaoke song.

また、シーケンサは、コーラストラックの指定するタイミングでバックコーラスの音声データ（楽曲データに付随しているＭＰ３等の圧縮音声データ）を再生する。また、シーケンサは、歌詞トラックに基づいて曲の進行に同期して歌詞の文字パターンを合成し、この文字パターンを映像信号に変換して表示処理部２３に入力する。 The sequencer also reproduces the back chorus audio data (compressed audio data such as MP3 attached to the music data) at the timing designated by the chorus track. Further, the sequencer synthesizes the character pattern of the lyrics in synchronism with the progress of the song based on the lyrics track, converts the character pattern into a video signal, and inputs it to the display processing unit 23.

音源１８は、シーケンサの処理によってＣＰＵ１１から入力されたデータ（ノートイベントデータ）に応じて楽音信号（デジタル音声信号）を形成する。形成した楽音信号はミキサ１９に入力される。 The sound source 18 forms a musical sound signal (digital audio signal) according to data (note event data) input from the CPU 11 by processing of the sequencer. The formed tone signal is input to the mixer 19.

ミキサ１９は、音源１８が発生した楽音信号、コーラス音、およびマイク（歌唱音声入力手段）１６からＡ／Ｄコンバータ１７を介して入力された歌唱者の歌唱音声信号に対してエコー等の音響効果を付与するとともに、これらの信号をミキシングする。 The mixer 19 generates an acoustic effect such as an echo on the musical sound signal generated by the sound source 18, the chorus sound, and the singing voice signal of the singer input from the microphone (singing voice input means) 16 via the A / D converter 17. And mixing these signals.

また、異なる場所に設置されているカラオケ装置７同士で通信を行い、デュエットを行う場合には、他のカラオケ装置から歌唱音声信号が送信される。ミキサ１９には、当該他のカラオケ装置から受信した歌唱音声信号も入力され、自装置のマイク１６から入力された歌唱音声信号とミキシングされる。 Moreover, when communicating between karaoke apparatuses 7 installed in different places and performing a duet, a singing voice signal is transmitted from another karaoke apparatus. The singing voice signal received from the other karaoke apparatus is also input to the mixer 19 and mixed with the singing voice signal input from the microphone 16 of the own apparatus.

ミキシングされた各デジタル音声信号はサウンドシステム２０に入力される。サウンドシステム２０は、Ｄ／Ａコンバータおよびパワーアンプを内蔵しており、入力されたデジタル信号をアナログ信号に変換して増幅し、スピーカ（楽音発生手段）２１から放音する。ミキサ１９が各音声信号に付与する効果およびミキシングのバランスは、ＣＰＵ１１によって制御される。 Each mixed digital audio signal is input to the sound system 20. The sound system 20 includes a D / A converter and a power amplifier, converts an input digital signal into an analog signal, amplifies it, and emits the sound from a speaker (musical sound generating means) 21. The effect that the mixer 19 gives to each audio signal and the balance of mixing are controlled by the CPU 11.

ＣＰＵ１１は、上記シーケンサによる楽音の発生、歌詞テロップの生成と同期して、ＨＤＤ１３に記憶されている映像データを読み出して背景映像等を再生する。動画の映像データは、ＭＰＥＧ形式にエンコードされている。 The CPU 11 reads the video data stored in the HDD 13 and reproduces the background video and the like in synchronism with the generation of musical sounds by the sequencer and the generation of the lyrics telop. The video data of the moving image is encoded in the MPEG format.

また、ＣＰＵ１１は、歌唱者を表す写真、またはキャラクタ等の映像データをセンタ１からダウンロードして表示処理部２３に入力することもできる。歌唱者を表す写真は、その場でカラオケ装置またはリモコン９に設けられたカメラ（不図示）で撮影したり、ユーザが所有する携帯端末等に設けられたカメラで撮影したりすることも可能である。 The CPU 11 can also download video data such as a photograph representing a singer or video data from the center 1 and input it to the display processing unit 23. The photograph representing the singer can be taken on the spot with a camera (not shown) provided on the karaoke device or the remote control 9, or with a camera provided on a mobile terminal owned by the user. is there.

ＣＰＵ１１は、読み出した背景映像の映像データをデコーダ２２に入力する。デコーダ２２は、入力されたＭＰＥＧ等のデータを映像信号に変換して表示処理部２３に入力する。表示処理部２３には、背景映像の映像信号以外に上記歌詞テロップの文字パターンとともに、ガイドメロディトラックに基づく線状画像の映像信号およびキャラクタ画像の映像信号も入力される。 The CPU 11 inputs the read video data of the background video to the decoder 22. The decoder 22 converts the input data such as MPEG into a video signal and inputs it to the display processing unit 23. In addition to the video signal of the background video, the display processing unit 23 also receives the video signal of the linear image and the video signal of the character image based on the guide melody track, together with the character pattern of the lyrics telop.

従来のピアノロールでは、縦軸が音階（ピアノの鍵盤が縦になった状態）、横軸が時間に対応した画面上に、各音の発音開始タイミングと発音の長さに応じた線状の画像を表示するものであった。しかし、本実施形態では、図５（Ａ）に示すように、奥行き方向が時間軸に対応し、平面方向（この例では上下方向）が音階に対応する主観視点で線状画像を表示するものである。主観視点とは、ユーザ自身の視野を模した表示態様である。図５（Ａ）のように、ユーザ自身に相当する画像（キャラクタ画像等）を表示し、当該キャラクタ画像等を背後から映すように表示する態様も主観視点に相当する。 In the conventional piano roll, the vertical axis is the scale (the piano keyboard is vertical), and the horizontal axis is the time corresponding to the sound start timing and the length of the sound. The image was displayed. However, in this embodiment, as shown in FIG. 5A, a linear image is displayed from a subjective viewpoint in which the depth direction corresponds to the time axis and the planar direction (in this example, the vertical direction) corresponds to the scale. It is. The subjective viewpoint is a display mode that imitates the user's own visual field. As shown in FIG. 5A, an aspect in which an image (character image or the like) corresponding to the user himself / herself is displayed and the character image or the like is displayed from behind is also equivalent to a subjective viewpoint.

図中の線状画像に重なって表示されている大きい球状の画像５０２は、ガイドメロディトラックにおける各音の発音タイミングである。これにより、ユーザは、球状の画像５０２が表示されているタイミングで発音を行う旨を把握することができる。また小さい球状の画像５０３は、採点のタイミングに対応し、歌唱音とガイドメロディとの音程が合致していれば得点が付与される。得点が付与された場合には、星状の画像５０５が流れるように表示され、図中右上に表示された得点欄５０４に得点が加算される。また、連続して音程が合致していた場合には、図中の上部に示すように、例えば「２５ｃｏｍｂｏ」と表示されることにより、連続して音程を合致させることができた数を把握することができる。 A large spherical image 502 displayed so as to overlap the linear image in the figure is the sound generation timing of each sound in the guide melody track. As a result, the user can grasp that the sound is generated at the timing when the spherical image 502 is displayed. The small spherical image 503 corresponds to the timing of scoring, and a score is given if the pitches of the singing sound and the guide melody match. When a score is given, a star-shaped image 505 is displayed so as to flow, and the score is added to the score column 504 displayed at the upper right in the figure. In addition, when the pitches are continuously matched, for example, “25 combo” is displayed as shown in the upper part of the figure, thereby grasping the number of pitches that can be matched continuously. be able to.

なお、本発明で言う線状画像とは、図５（Ｂ）に示しているような細長い線に限るものではなく、図５（Ａ）の例に示したように、左右または上下方向にある程度の幅を有した画像が一方向（この例では奥行き方向）に延びるものも含む。 Note that the linear image referred to in the present invention is not limited to the long and narrow line as shown in FIG. 5B, and as shown in the example of FIG. An image having a width of 1 is extended in one direction (in this example, the depth direction).

図５（Ｂ）に示すように、本実施形態では、リファレンスデータに対応する線状画像（図５（Ａ）においては、線状画像５０１Ａ、線状画像５０１Ｂ、および線状画像５０１Ｃ）が奥行き方向に延びるように表示される。また、各線状画像５０１Ａ、線状画像５０１Ｂ、および線状画像５０１Ｃには、それぞれ対応するキャラクタ画像１０１Ａ、キャラクタ画像１０１Ｂ、およびキャラクタ画像１０１Ｃが表示される。画面上部には採点結果に対応する得点が表示される。 As shown in FIG. 5B, in this embodiment, the linear images corresponding to the reference data (in FIG. 5A, the linear image 501A, the linear image 501B, and the linear image 501C) have a depth. It is displayed to extend in the direction. In addition, the corresponding character image 101A, character image 101B, and character image 101C are displayed in each of the linear image 501A, the linear image 501B, and the linear image 501C, respectively. The score corresponding to the scoring result is displayed at the top of the screen.

まず、ＣＰＵ１１は、ガイドメロディトラックに含まれている各音の発音開始タイミングおよび発音の長さの情報に基づいて、線状画像を生成する。そして、ＣＰＵ１１は、各音の線状画像を滑らかに連結する。各音の連結部分の傾きは、例えば１６分音符に対応する時間長で連結させる等、一律に同じ傾きの画像として表示させる態様とする。ただし、実際には各曲に個別の歌い方が存在し、音程の変化の態様は一律ではない。したがって、各音の連結部分毎に異なる傾きで表示されることが好ましい。この場合、リファレンスデータとして、各音の連結部分の音程の変化に応じた傾きを指定する情報が含まれていてもよい。なお、線状画像は、予め各曲毎に用意した画像データ（線状画像に相当するもの）を読み出して表示するようにしてもよい。 First, the CPU 11 generates a linear image based on information on the sound generation start timing and the sound generation length of each sound included in the guide melody track. Then, the CPU 11 smoothly connects the linear images of the sounds. The inclination of the connection part of each sound is displayed as an image having the same inclination uniformly, for example, by connecting with a time length corresponding to a sixteenth note. However, there are actually different ways of singing for each song, and the manner of changing the pitch is not uniform. Therefore, it is preferable to display with a different inclination for each connection portion of each sound. In this case, the reference data may include information that specifies an inclination according to a change in the pitch of the connected portion of each sound. Note that the linear image may be read and displayed as image data (corresponding to the linear image) prepared for each song in advance.

その後、ＣＰＵ１１は、ブレス位置トラックが示す息継ぎタイミングで線状画像を離散させる。例えば、従来の客観視点で説明すると、図５（Ｂ）に示すように、「あかい」の発音の後と、「はなが」の冒頭の発音タイミングとの間に息継ぎタイミングが存在するため、「あかい」の線状画像と、「はなが」の線状画像とを離散させる。 Thereafter, the CPU 11 makes the linear image discrete at the breathing timing indicated by the breath position track. For example, from a conventional objective viewpoint, as shown in FIG. 5B, there is a breathing timing between the pronunciation of “Akai” and the beginning of “Hanaga”. The line image of “Akai” and the line image of “Hanaga” are separated.

そして、ＣＰＵ１１は、当該線状画像を図５（Ａ）に示したような主観視点で表示する。これにより、歌唱者は、各音の発音タイミングと音程とを、より直感的に把握することができる。なお、線状画像を滑らかに接続することは、本発明において必須ではない。ただし、線状画像を滑らかに接続することにより、歌唱者は、各音のつなげ方および息継ぎのタイミングを視覚的に把握することができる。また、ＣＰＵ１１は、ブレス位置トラックが示す息継ぎタイミングで線状画像を離散させるため（リファレンスデータに息継ぎタイミングを示す情報が含まれているため）、単なる無音区間と息継ぎの区間と明確に区別して表示することができ、ユーザに対して正確な息継ぎの位置を把握させることができる。 Then, the CPU 11 displays the linear image from a subjective viewpoint as shown in FIG. Thereby, the singer can grasp the pronunciation timing and pitch of each sound more intuitively. Note that it is not essential in the present invention to connect linear images smoothly. However, by connecting the linear images smoothly, the singer can visually grasp how to connect each sound and the timing of breathing. In addition, since the CPU 11 makes the line image discrete at the breathing timing indicated by the breath position track (because the reference data includes information indicating the breathing timing), it is clearly distinguished from the silent section and the breathing section. This allows the user to know the exact breathing position.

また、ＣＰＵ１１は、現在の歌唱位置にキャラクタを表示する。この例では、自身のキャラクタ画像１０１Ａを画面中央の下部に表示し、キャラクタ画像１０１Ａが線状画像に沿って（すなわち歌唱者の音程に沿って）奥行き方向に移動するように、線状画像および背景をスクロールさせる。歌唱者が音程を誤って歌唱した場合には、キャラクタ画像１０１Ａが線状画像から離れた位置に表示されることになり、歌唱者は、音程がどの程度外れたのかを直感的に把握することができる。 Further, the CPU 11 displays a character at the current singing position. In this example, the character image 101A is displayed at the lower center of the screen, and the linear image and the character image 101A move in the depth direction along the linear image (that is, along the pitch of the singer). Scroll the background. If the singer sings the pitch by mistake, the character image 101A will be displayed at a position away from the linear image, and the singer intuitively grasps how far the pitch is off. Can do.

また、ＣＰＵ１１は、採点結果に応じた得点を表示する。採点は、歌唱者の歌唱音声をガイドメロディトラックと比較することによって行われる。採点は、ガイドメロディトラックのノート毎に、歌唱音声とガイドメロディの音程（ピッチ）を比較することによって行われる。すなわち、歌唱音声の音程が、所定時間以上、ガイドメロディトラックの音程に合っていた（許容範囲に入っていた）場合には、高い得点を付与する。また、音程変化のタイミングも得点に考慮される。さらに、音程変化のタイミング、ビブラート、抑揚、しゃくり（低い音程からなだらかに移行すること）等の歌唱技法の有無に基づいて加点も行われる。 Moreover, CPU11 displays the score according to the scoring result. Scoring is performed by comparing the singer's singing voice with the guide melody track. Scoring is performed by comparing the pitch of the singing voice and the guide melody for each note of the guide melody track. That is, a high score is given when the pitch of the singing voice matches the pitch of the guide melody track for a predetermined time or longer (within the allowable range). The timing of pitch change is also taken into account. Further, points are added based on the presence or absence of singing techniques such as timing of pitch change, vibrato, inflection, and squealing (moving gently from a low pitch).

なお、採点処理では、ブレス位置トラックに含まれる息継ぎタイミングにおいて歌唱者が息継ぎを行ったか否かも加点対象としてもよい。息継ぎを行ったか否かは、当該息継ぎタイミングを含む所定時間内においてマイク１６から音声が入力されていない（入力レベルが所定閾値未満である）場合か、息継ぎ音が収音された場合に、息継ぎを行ったと判定し、マイク１６から音声が入力された（入力レベルが所定閾値以上である）場合に、息継ぎが行われていないと判定する。息継ぎ音が収音されたか否かは、例えばパターンマッチング等で息継ぎ音の波形と対比することで判断する。なお、ＣＰＵ１１は、息継ぎを検出しなかった場合に、キャラクタ画像が地面（線状画像が途切れている箇所）から落ちるような表示を行ってもよい。 In the scoring process, whether or not the singer performs breath breathing at the breath breathing timing included in the breath position track may also be added. Whether or not breathing has been performed depends on whether no sound is input from the microphone 16 within a predetermined time including the breathing timing (the input level is less than a predetermined threshold) or when a breathing sound is collected. When the voice is input from the microphone 16 (the input level is equal to or higher than the predetermined threshold value), it is determined that breathing is not performed. Whether or not the breathing sound has been collected is determined by comparing with the waveform of the breathing sound, for example, by pattern matching or the like. Note that the CPU 11 may perform display such that the character image falls from the ground (a portion where the linear image is interrupted) when the breathing is not detected.

また、本実施形態の採点処理では、技法位置トラックに含まれる各技法のタイミングにおいて、同じ技法を検出した場合に、より高い得点を付与することが好ましい。 In the scoring process of this embodiment, it is preferable to give a higher score when the same technique is detected at the timing of each technique included in the technique position track.

なお、採点処理は、各カラオケ装置において行ってもよいが、センタ１（または他のサーバ）で行ってもよい。また、ネットワークを介して他のカラオケ装置とデュエットを行っている場合には、代表的に処理を行うカラオケ装置１台で採点処理を行ってもよい。 The scoring process may be performed in each karaoke apparatus, but may be performed in the center 1 (or another server). Moreover, when performing duet with another karaoke apparatus via a network, you may perform a scoring process with one karaoke apparatus which processes typically.

以上のようにして、歌唱者は、自身の歌唱に応じてキャラクタ画像１０１Ａが移動し、かつ自身の歌唱の採点結果を得点で確認することができるため、ゲームのように楽しんでカラオケを行うことができる。 As described above, the singer can enjoy the karaoke like a game because the character image 101A moves according to his / her singing and the scoring result of his / her singing can be confirmed by scoring. Can do.

また、図５（Ａ）の例では、デュエット歌唱を行う場合に、自身に相当するキャラクタ画像１０１Ａとともに、他の歌唱者に相当するキャラクタ画像１０１Ｂおよびキャラクタ画像１０１Ｃが並んで表示される。デュエット曲用の楽曲データは、複数の歌唱者用に複数のリファレンスデータ（ガイドメロディトラック）が含まれている。したがって、ＣＰＵ１１は、各ガイドメロディトラックに対応する線状画像を並べて表示し、それぞれの線状画像に対応してキャラクタ画像を表示する。これにより、ユーザは、他の歌唱者と一緒に歌唱を行っている雰囲気をより感じ取ることができる。また、各歌唱者の採点結果に応じて得点を加算して態様とすれば、各歌唱者が協力しながら得点を稼いでいると感じることができ、より一体感を得ることができる。 In the example of FIG. 5A, when performing a duet singing, a character image 101B corresponding to the other singer and a character image 101C corresponding to the other singers are displayed side by side. The music data for duet music includes a plurality of reference data (guide melody tracks) for a plurality of singers. Therefore, the CPU 11 displays the line images corresponding to each guide melody track side by side, and displays the character image corresponding to each line image. Thereby, the user can feel more the atmosphere of singing with other singers. Moreover, if a score is added according to the scoring result of each singer and it is set as an aspect, it can feel that each singer is earning a score while cooperating, and a sense of unity can be obtained more.

なお、マイク１６にジャイロセンサまたは加速度センサ等の姿勢を検出することができるセンサを設け、マイク１６の姿勢に応じて主観視点の画像を変更するようにしてもよい。例えば、図５（Ａ）の例では、キャラクタ画像１０１Ａが左右の中心付近に表示されているが、歌唱者がマイク１６を右方向に向けた場合に、画面内のキャラクタ画像および線状画像が左方向に移動し、キャラクタ１０１Ｃが左右の中心に近づくような表示を行う。これにより、歌唱者が右方向を向いた場合に主観視点の画面も右側に向いたような表示が行われ、よりリアルで直感的な表示態様となる。 Note that a sensor that can detect the attitude of the microphone 16 such as a gyro sensor or an acceleration sensor may be provided, and the subjective viewpoint image may be changed according to the attitude of the microphone 16. For example, in the example of FIG. 5A, the character image 101A is displayed near the center of the left and right, but when the singer turns the microphone 16 to the right, the character image and the linear image on the screen are displayed. The display moves so that the character 101C approaches the left and right centers. As a result, when the singer turns to the right, the subjective viewpoint screen is also displayed to the right, resulting in a more realistic and intuitive display mode.

なお、歌唱者がデュエット曲用ではない楽曲データを選択した場合（同じメロディを異なる歌唱者で歌唱する場合）、ＣＰＵ１１は、例えば同じ線状画像を並べて表示し、それぞれの線状画像に対応してキャラクタ画像を表示する。 When the singer selects music data that is not for a duet song (when the same melody is sung by different singers), the CPU 11 displays, for example, the same linear images side by side, and corresponds to each linear image. To display the character image.

なお、図５（Ａ）の例では、上下方向が音階に対応する主観視点で表示するものであるが、例えば図６に示すように、左右方向が音階に対応する主観視点で表示してもよい。この場合、左方向に向かって低い音階となり、右方向に向かって高い音階となっている。 In the example of FIG. 5A, the vertical direction is displayed from the subjective viewpoint corresponding to the musical scale, but for example, as shown in FIG. 6, the horizontal direction is displayed from the subjective viewpoint corresponding to the musical scale. Good. In this case, the scale is lower in the left direction and higher in the right direction.

なお、本実施形態においては、線状画像を主観視点で表示する例を示したが、例えば図８に示すように、客観視点で表示するようにしてもよい。図８の例では、左右方向が時間軸に対応し、上下方向が音階に対応するようになっている。この例では、線状画像が地面の画像に対応し、息継ぎタイミングにおいて地面の画像が途切れるようになっている。そして、キャラクタ画像１０１Ａが線状画像（地面の画像）に沿って移動するように、線状画像および背景をスクロールさせる。息継ぎタイミングにおいては、地面が途切れるため、息継ぎ（マイク１６から音声が入力されていない状態）を検出しなかった場合に、キャラクタ画像１０１Ａが地面から落ちるようになっている。また、この例でも、歌唱採点の結果が画面上に表示される。したがって、この例においても、歌唱者は、ゲームのように楽しんでカラオケを行うことができる。 In the present embodiment, an example in which a linear image is displayed from a subjective viewpoint has been described. However, for example, as shown in FIG. 8, it may be displayed from an objective viewpoint. In the example of FIG. 8, the left-right direction corresponds to the time axis, and the up-down direction corresponds to the musical scale. In this example, the linear image corresponds to the ground image, and the ground image is interrupted at the breath connection timing. Then, the linear image and the background are scrolled so that the character image 101A moves along the linear image (ground image). At the breath connection timing, since the ground is interrupted, the character image 101A falls from the ground when the breath connection (state in which no sound is input from the microphone 16) is not detected. Also in this example, the singing result is displayed on the screen. Therefore, also in this example, the singer can enjoy karaoke like a game.

次に、カラオケシステムの動作についてフローチャートを参照して説明する。図７は、カラオケシステムの動作を示すフローチャートである。 Next, the operation of the karaoke system will be described with reference to a flowchart. FIG. 7 is a flowchart showing the operation of the karaoke system.

まず、ＣＰＵ１１は、歌唱者から楽曲のリクエストを受け付ける（ｓ１１）。さらに、ＣＰＵ１１は、ネットワーク経由のデュエット歌唱のリクエストであるか否かを判断する（ｓ１２）。ネットワーク経由のデュエット歌唱のリクエストであった場合、センタ１に対して接続リクエストを行う（ｓ１３）。このとき、ＣＰＵ１１は、モニタ２４において、ネットワークを介して接続された他のカラオケ装置の歌唱者とデュエットを行うか否かを促す画像を表示し、ネットワーク経由のデュエット歌唱を受け付ける。例えば、歌唱者がタッチパネル１５、操作部２５、またはリモコン９を用いて特定のユーザの氏名を入力すると、センタ１に当該氏名に係る情報が送信される。 First, the CPU 11 receives a music request from a singer (s11). Further, the CPU 11 determines whether or not it is a request for a duet song via the network (s12). If the request is duet duet via the network, a connection request is made to the center 1 (s13). At this time, the CPU 11 displays an image prompting whether or not to perform a duet with a singer of another karaoke apparatus connected via the network on the monitor 24, and accepts a duet song via the network. For example, when a singer inputs the name of a specific user using the touch panel 15, the operation unit 25, or the remote control 9, information related to the name is transmitted to the center 1.

センタ１は、接続リクエストを受信し（ｓ５１）、ＨＤＤ３３に記憶されている歌唱者情報を読み出して、該当する歌唱者のカラオケ装置と接続処理を行う（ｓ５２）。これにより、各カラオケ装置は、他のカラオケ装置と接続され（ｓ１４）、各歌唱者の歌唱データを送受信することによりデュエット歌唱が可能となる。 The center 1 receives the connection request (s51), reads the singer information stored in the HDD 33, and performs connection processing with the karaoke apparatus of the corresponding singer (s52). Thereby, each karaoke apparatus is connected with another karaoke apparatus (s14), and a duet song is attained by transmitting / receiving singing data of each singer.

次に、カラオケ装置７のＣＰＵ１１は、リクエストされた楽曲データを読み出し（ｓ１５）、描画処理を行う（ｓ１６）。すなわち、ＣＰＵ１１は、ＨＤＤ１３に記憶されている映像データを読み出し、背景映像を生成するとともに、当該背景映像にキャラクタ画像を合成する。さらに、ＣＰＵ１１は、ガイドメロディトラックに含まれている各音の発音開始タイミングおよび発音の長さの情報に基づいて、線状画像を生成する。ＣＰＵ１１は、ブレス位置トラックから息継ぎタイミングの情報を読み出し、各音の線状画像を滑らかに連結するとともにブレス位置トラックが示す息継ぎタイミングで線状画像を離散させて表示する。当該線状画像は、各キャラクタ画像に対応させて背景映像に合成する。デュエット歌唱である場合には、複数の線状画像およびキャラクタ画像を並べて表示する。 Next, the CPU 11 of the karaoke apparatus 7 reads the requested music data (s15) and performs a drawing process (s16). That is, the CPU 11 reads the video data stored in the HDD 13, generates a background video, and synthesizes a character image with the background video. Further, the CPU 11 generates a linear image based on information on the sound generation start timing and the sound generation length of each sound included in the guide melody track. The CPU 11 reads the breath timing information from the breath position track, smoothly connects the linear images of each sound, and displays the line images in a discrete manner at the breath timing indicated by the breath position track. The linear image is synthesized with a background video in correspondence with each character image. In the case of a duet song, a plurality of linear images and character images are displayed side by side.

なお、ＣＰＵ１１は、促音の音素に係る線状画像と、該促音の音素の後の線状画像と、を離散させて表示することが好ましい。促音は、日本語のかな表記では「っ」「ッ」で表されるものであり、後に続く音との間が無音となるものである。ＣＰＵ１１は、歌詞トラックの中から促音を抽出し、抽出した促音を発するタイミングにおいて、線状画像を離散させる。例えば、「つかったら」という歌詞であった場合、「か」の後に促音が存在するため、「か」の線状画像と、その後の「た」の線状画像とを離散させて表示する。また、促音に係る音素が存在する旨を示す画像（この例では「っ」と記載された四角画像）をさらに表示してもよい。 The CPU 11 preferably displays the linear image related to the phoneme of the prompt sound and the linear image after the phoneme of the prompt sound in a discrete manner. The prompt sound is represented by “t” or “t” in Japanese kana notation, and is silent between the following sound. The CPU 11 extracts the prompt sound from the lyrics track, and disperses the linear image at the timing of emitting the extracted prompt sound. For example, in the case of the lyrics “if you use”, there is a prompt sound after “ka”, so the linear image of “ka” and the subsequent linear image of “ta” are displayed separately. In addition, an image indicating that there is a phoneme related to the prompting sound (in this example, a square image with “tsu”) may be further displayed.

また、ＣＰＵ１１は、技法位置トラックを読み出して、歌唱技法を線状画像に表示することが好ましい。例えば図５（Ａ）の例では、冒頭はしゃくり（リファレンスの音程よりも低い音程から持ち上げつつ歌唱を行う）の技法を行う箇所である。したがって「ノ」のような音程が上がることを連想させる画像５０６を表示する。また、図６に示すように、ビブラートの区間については波線状の線状画像に加えて、別途「ｍ」のような波線状の画像５０７を表示する。これにより、歌唱者は、どのタイミングでどのような歌唱技法を行うか、容易に把握することができる。 Moreover, it is preferable that CPU11 reads a technique position track | truck and displays a singing technique on a linear image. For example, in the example of FIG. 5A, the beginning is a portion where a technique of scrambling (singing while lifting from a pitch lower than the reference pitch) is performed. Therefore, an image 506 is displayed that is reminiscent of a rising pitch like “no”. Further, as shown in FIG. 6, in addition to the wavy line-shaped image, a wavy line image 507 like “m” is separately displayed for the vibrato section. Thereby, the singer can grasp | ascertain easily what kind of singing technique is performed at which timing.

なお、ビブラートの区間は、線状画像を波線に変更して表示する態様としてもよい。また、技法位置トラックに「タメ」の情報が含まれている場合には、当該「タメ」の情報に対応する音（この例では「か」の音）の先頭の位置を遅らせる。これにより、歌唱者は、当該特定の音（この例では「か」の音）の歌いだしを故意に遅らせる「タメ」を直感的に把握し易くなる。また、技法位置トラックに「コブシ」の情報が含まれている場合には、当該「コブシ」のタイミングに対応する位置で線状画像を一時的に上昇させる。これにより、特定の音の声色を発音の途中でうなるように変化させる歌唱技法である「コブシ」に対応する線状画像とすることも可能である。また、技法位置トラックに「フォール」の情報が含まれている場合には、「フォール」のタイミングに対応する位置から線状画像を低い音程に変化させる態様とすることで、「フォール」の歌唱技法に対応する線状画像とすることも可能である。 The vibrato section may be displayed by changing the linear image into a wavy line. Further, when the “position” information is included in the technique position track, the head position of the sound corresponding to the “time” information (in this example, the “ka” sound) is delayed. This makes it easier for the singer to intuitively understand the “tame” that intentionally delays the singing of the specific sound (“ka” sound in this example). When the technique position track includes “Kobushi” information, the linear image is temporarily raised at a position corresponding to the timing of the “Kobushi”. Thereby, it is also possible to obtain a linear image corresponding to “Kobushi”, which is a singing technique for changing the voice color of a specific sound so as to beat in the middle of pronunciation. When the technique position track includes “fall” information, the line image is changed from the position corresponding to the “fall” timing to a lower pitch, so that the “fall” singing is performed. A linear image corresponding to the technique may be used.

さらに、ＣＰＵ１１は、ブレス位置トラックが示す息継ぎタイミングを示す情報に基づいて、息継ぎを促す画像（例えば「Ｖ」のような画像）を、線状画像とともに表示する態様としてもよい。 Further, the CPU 11 may display an image (for example, an image such as “V”) for prompting breathing together with a linear image based on information indicating breathing timing indicated by the breath position track.

また、ＣＰＵ１１は、ガイドメロディトラックに含まれている各音の音量を示す情報を読み出し、各音の音量に応じた線状画像に変更する処理を行ってもよい。例えば、音量に応じて線の太さを変更したり、音量に応じて線の色を変更したりする。 Further, the CPU 11 may read information indicating the volume of each sound included in the guide melody track, and perform a process of changing to a linear image corresponding to the volume of each sound. For example, the thickness of the line is changed according to the volume, or the color of the line is changed according to the volume.

なお、同じ音程で続けて異なる歌詞を発音する場合、線状画像が連結されていると、ユーザは、どのタイミングで次の歌詞を発音するのか把握し難い可能性がある。そこで、ＣＰＵ１１は、図５（Ａ）に示すように、ガイドメロディトラックにおける各音の発音タイミングにおいて、例えば線状画像の上に球体の画像を重畳して表示する。これにより、ユーザは、当該球体画像が表示されているタイミングで発音を行う旨を把握することができる。なお、図５（Ａ）の例では、拍タイミングで小さい球体画像を表示し、発音タイミングで大きい球体画像を表示する態様としている。 When different lyrics are continuously generated with the same pitch, if the linear images are connected, the user may have difficulty grasping at which timing the next lyrics are to be generated. Therefore, as shown in FIG. 5A, the CPU 11 displays, for example, a spherical image superimposed on a linear image at the sound generation timing of each sound in the guide melody track. Thereby, the user can grasp | ascertain that sounding is performed at the timing when the said spherical image is displayed. In the example of FIG. 5A, a small spherical image is displayed at the beat timing and a large spherical image is displayed at the sounding timing.

また、ＣＰＵ１１は、描画処理において、歌詞トラックを読み出して、各線状画像に歌詞の画像を対応付け表示する。 In the drawing process, the CPU 11 reads out the lyrics track and displays the lyrics image in association with each linear image.

次に、ＣＰＵ１１は、カラオケ演奏の進行とともに採点処理を行う（ｓ１７）。採点処理は、上述したようにリファレンスデータとの比較で行われる。 Next, the CPU 11 performs a scoring process as the karaoke performance progresses (s17). The scoring process is performed by comparison with the reference data as described above.

ＣＰＵ１１は、当該採点処理における採点結果、およびマイク１６から入力された歌唱音に係る歌唱データをセンタ１に送信する（ｓ１８）。センタ１は、各カラオケ装置７から採点結果および歌唱データを受信し、対応する他のカラオケ装置７に歌唱データを送信する（ｓ５３）。これにより、各カラオケ装置７における歌唱音声が、他のカラオケ装置７に送信される。各カラオケ装置７では、自身の歌唱音声とデュエット相手の歌唱音声がミキシングされてスピーカ２１から放音される。また、センタ１は、各カラオケ装置７から受信した採点結果を集計した後、各カラオケ装置７に送信する。これにより、各カラオケ装置７では、自身の採点結果とデュエット相手の採点結果とを集計した採点結果を得ることができ、例えば集計後の採点結果に応じた点数を表示する。 The CPU 11 transmits the scoring result in the scoring process and the singing data related to the singing sound input from the microphone 16 to the center 1 (s18). The center 1 receives the scoring results and singing data from each karaoke device 7 and transmits the singing data to another corresponding karaoke device 7 (s53). Thereby, the singing voice in each karaoke device 7 is transmitted to the other karaoke devices 7. In each karaoke apparatus 7, its own singing voice and the singing voice of the duet partner are mixed and emitted from the speaker 21. Further, the center 1 aggregates the scoring results received from each karaoke device 7 and then transmits it to each karaoke device 7. Thereby, in each karaoke apparatus 7, the scoring result which totaled its own scoring result and the duet partner's scoring result can be obtained, for example, the score according to the scoring result after summation is displayed.

以上のような処理が楽曲の演奏終了時までなされる。カラオケ装置７のＣＰＵ１１は、楽曲の演奏が終了したか否かを判断し（ｓ１９）、楽曲の演奏が終了するまで、曲の進行にしたがって線状画像をスクロールさせるとともにキャラクタ画像を移動させる。また、採点結果に応じた点数を表示する。同様に、センタ１のＣＰＵ３１は、楽曲の演奏が終了したか否かを判断し（ｓ５４）、楽曲の演奏が終了するまで、各カラオケ装置７から歌唱データおよび採点結果を送受信する。 The above processing is performed until the end of the music performance. The CPU 11 of the karaoke device 7 determines whether or not the performance of the music has ended (s19), and scrolls the linear image and moves the character image according to the progress of the music until the performance of the music is completed. Moreover, the score according to the scoring result is displayed. Similarly, the CPU 31 of the center 1 determines whether or not the performance of the music has ended (s54), and transmits and receives song data and scoring results from each karaoke device 7 until the performance of the music is completed.

なお、本実施形態では、カラオケ装置７を用いてカラオケ演奏および主観視点で線状画像の表示を行う態様を示したが、例えばユーザの所有するＰＣやスマートフォン、ゲーム機等の情報処理装置（マイク、スピーカ、および表示部の構成を備えたもの）を用いることでも、本発明のリファレンス表示装置を実現することが可能である。なお、楽曲データおよびリファレンスデータは、表示装置に記憶されている必要はなく、サーバから都度ダウンロードして利用するようにしてもよい。 In the present embodiment, the karaoke device 7 is used to perform a karaoke performance and display a linear image from a subjective viewpoint. However, for example, an information processing device (microphone, etc.) such as a PC, a smartphone, or a game machine owned by the user. The reference display device of the present invention can be realized also by using a speaker and a display and a display unit. Note that the music data and the reference data need not be stored in the display device, and may be downloaded from the server each time and used.

また、本実施形態では、カラオケにおけるガイドメロディを線状画像として表示する例を示したが、例えば吹奏楽器演奏のお手本の音程変化を線状画像として表示し、息継ぎタイミングで線状画像が途切れる態様としても同様の効果が得られる。また、例えば語学学習において、お手本の発音タイミングおよび発音長を示す線状画像を表示し、息継ぎタイミングおよび促音で線状画像が途切れる態様としても、同様の効果が得られる。 Moreover, in this embodiment, although the example which displays the guide melody in karaoke as a linear image was shown, the mode change of the example of a wind instrument performance is displayed as a linear image, for example, and the linear image interrupts at the breathing timing The same effect can be obtained. Further, for example, in language learning, a similar effect can be obtained by displaying a linear image indicating the model's pronunciation timing and pronunciation length and interrupting the linear image at the breath timing and the prompt sound.

なお、図５においては、ユーザ自身に相当するキャラクタ画像および他のユーザに相当するキャラクタ画像をそれぞれ表示する態様を示したが、複数の線状画像および複数のキャラクタ画像を表示することは、本発明において必須ではない。例えば図９（Ａ）に示すように、キャラクタ画像を表示せず、奥行き方向を時間軸に対応させ、平面方向を音程に対応させた主観視点で線状画像のみを表示する態様としてもよい。図９（Ａ）の例では歌詞も表示しているが、歌詞の表示も本発明において必須ではない。また、この場合も、図９（Ｂ）に示すように、時間経過にともなって奥行き方向に延びた線状画像が手前に近づいてくるように表示されるため、ユーザは、各音の発音タイミングと音程を直感的に把握することができ、楽しみながら歌唱や楽器演奏等を行うことができる。 Although FIG. 5 shows a mode in which a character image corresponding to the user himself and a character image corresponding to another user are respectively displayed, displaying a plurality of linear images and a plurality of character images It is not essential in the invention. For example, as shown in FIG. 9 (A), only a linear image may be displayed from a subjective viewpoint in which the character image is not displayed, the depth direction corresponds to the time axis, and the plane direction corresponds to the pitch. Although the lyrics are also displayed in the example of FIG. 9A, the display of the lyrics is not essential in the present invention. Also in this case, as shown in FIG. 9B, the linear image extending in the depth direction is displayed so as to approach the front as time elapses. You can intuitively grasp the pitch and enjoy singing and playing musical instruments.

なお、図１０に示すように、本発明のリファレンス表示装置は、表示部であるモニタ２４と、線状画像表示処理を行う画像処理部として機能するＣＰＵ１１と、を備え、当該ＣＰＵ１１がＨＤＤ１３に記憶されている楽曲データ（本発明のリファレンスデータの一例である。）に基づいた線状画像を主観視点で表示させる態様とすればよい。他のハードウェア構成は、本発明において必須の要素ではない。 As shown in FIG. 10, the reference display device of the present invention includes a monitor 24 that is a display unit, and a CPU 11 that functions as an image processing unit that performs a linear image display process, and the CPU 11 stores the data in the HDD 13. What is necessary is just to make it the aspect which displays the linear image based on the music data currently performed (it is an example of the reference data of this invention) with a subjective viewpoint. Other hardware configurations are not essential elements in the present invention.

また、上述したように、リファレンスデータは、ＨＤＤ１３に記憶されている必要はなく、外部（例えばサーバ）から都度ダウンロードして利用するようにしてもよい。また、デコーダ２２、表示処理部２３、およびＲＡＭ１２も、ＣＰＵ１１の機能の一部として当該ＣＰＵ１１が内蔵していてもよい。 Further, as described above, the reference data does not need to be stored in the HDD 13, and may be downloaded and used from the outside (for example, a server) each time. In addition, the decoder 22, the display processing unit 23, and the RAM 12 may be incorporated in the CPU 11 as a part of the function of the CPU 11.

１…センタ
２…ネットワーク
３…カラオケ店舗
５…中継機
７…カラオケ装置
９…リモコン
１１…ＣＰＵ
１２…ＲＡＭ
１３…ＨＤＤ
１５…タッチパネル
１６…マイク
１７…Ａ／Ｄコンバータ
１８…音源
１９…ミキサ
２０…サウンドシステム
２１…スピーカ
２２…デコーダ
２３…表示処理部
２４…モニタ
２５…操作部
２６…送受信部
３１…ＣＰＵ
３２…ＲＡＭ
３３…ＨＤＤ
１０１…キャラクタ画像
１０１Ａ，１０１Ｂ，１０１Ｃ…キャラクタ画像
５０１Ａ，５０１Ｂ，５０１Ｃ…線状画像 DESCRIPTION OF SYMBOLS 1 ... Center 2 ... Network 3 ... Karaoke store 5 ... Relay machine 7 ... Karaoke apparatus 9 ... Remote control 11 ... CPU
12 ... RAM
13 ... HDD
DESCRIPTION OF SYMBOLS 15 ... Touch panel 16 ... Microphone 17 ... A / D converter 18 ... Sound source 19 ... Mixer 20 ... Sound system 21 ... Speaker 22 ... Decoder 23 ... Display processing part 24 ... Monitor 25 ... Operation part 26 ... Transmission / reception part 31 ... CPU
32 ... RAM
33 ... HDD
101 ... Character images 101A, 101B, 101C ... Character images 501A, 501B, 501C ... Linear images

Claims

A display unit;
An image processing unit that displays on the display unit a linear image indicating a sounding timing, a pitch, and a sounding length based on the reference data;
A reference display device comprising:
The reference display device, wherein the image processing unit displays the linear image from a subjective viewpoint in which a depth direction corresponds to a time axis and a planar direction corresponds to a pitch.

The reference display device according to claim 1, wherein the image processing unit displays different reference data side by side as the plurality of linear images from the subjective viewpoint.

The image processing unit displays an image corresponding to the user at a position corresponding to the current pronunciation timing, and scrolls the linear image so that the image corresponding to the user moves along the linear image. The reference display device according to claim 1.

An input means for inputting a user's pronunciation result;
Scoring means for scoring by comparing the user's pronunciation results with the reference data,
The reference display device according to claim 1, wherein the image processing unit displays a scoring result of the scoring unit together with the linear image.

In an information processing device equipped with a display unit,
A program for executing an image processing step of displaying a linear image indicating a sound generation timing, a pitch, and a sound generation length on the display unit based on reference data,
The image processing step displays the linear image from a subjective viewpoint in which a depth direction corresponds to a time axis and a planar direction corresponds to a pitch.