JP2009111506A

JP2009111506A - Apparatus for generating monitor information

Info

Publication number: JP2009111506A
Application number: JP2007279328A
Authority: JP
Inventors: Tomoyuki Udagawa; 智之宇田川
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2007-10-26
Filing date: 2007-10-26
Publication date: 2009-05-21

Abstract

<P>PROBLEM TO BE SOLVED: To generate display information which allows a guardian to easily confirm whether an event has occurred in the past and event contents upon detection of the event, without requiring permanent stationing of the guardian. <P>SOLUTION: An apparatus includes: a reference data storage part 608 which prestores reference sound data in which typical sound data on the occurrence of the event is associated with event content data showing contents of the event; a sound analysis part 609 which, when reference sound data pertinent to sound data being an analysis object received from a microphone 20 is present, extracts the associated event content data; a sound information generation part 610 which generates sound information display data for displaying sound pressure level value of sound data received by the time before a prescribed period, on a time axis as a graph and relates event content data with a pertinent position on the time axis; and a data superposing part 612 which superposes the generated sound information display data on image data received from a monitoring camera 10 to generate display data for display. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、監視エリアの映像情報および音声に関する情報を表示させるための表示情報を生成する監視情報生成装置に関する。 The present invention relates to a monitoring information generating apparatus that generates display information for displaying video information and audio information in a monitoring area.

近年、監視カメラで撮影したデジタル映像信号やデジタル音声信号を表示装置または記録装置へ伝送するときの伝送媒体として、イーサネット（登録商標）等の通信ネットワークを用いるネットワーク型の監視システムが急速に普及している。 In recent years, a network-type monitoring system using a communication network such as Ethernet (registered trademark) has rapidly spread as a transmission medium for transmitting digital video signals and digital audio signals captured by a monitoring camera to a display device or a recording device. ing.

このようなネットワーク型の監視システムにおいては、複数の監視カメラで撮影した映像信号や音声信号を１本のケーブルで重畳して伝送することが可能である。 In such a network type surveillance system, it is possible to superimpose and transmit video signals and audio signals captured by a plurality of surveillance cameras using a single cable.

そのため、監視対象の施設内に予めネットワークケーブルを配線しておくことにより、設置する監視カメラの台数を後から任意に変更することが可能になる。 Therefore, it is possible to arbitrarily change the number of surveillance cameras to be installed later by arranging network cables in advance in the facility to be monitored.

また、このようなネットワーク型の監視システムでは、監視カメラにおけるパン・チルト・ズーム等の撮影方向、尺度の制御を行うための制御信号もネットワークケーブルを利用して遠隔から送信することが可能であり、アナログ信号で伝送する監視システムのように別途制御信号送信用の配線をする必要がないという利点がある。 Also, in such a network type surveillance system, it is possible to remotely transmit a control signal for controlling the photographing direction and scale of the surveillance camera such as pan, tilt and zoom using a network cable. There is an advantage that it is not necessary to separately provide a wiring for transmitting a control signal unlike a monitoring system that transmits an analog signal.

このような監視システムにおいては、従来は映像情報のみによる監視が主であったが、監視対象の場所の状況をより詳しく知るために音声情報も加えたいという要求が高まってきている。 Conventionally, in such a monitoring system, monitoring has mainly been performed only with video information, but there is an increasing demand for adding audio information in order to know in more detail the situation of the location to be monitored.

監視の対象に音声情報も加えることにより、映像に映っていないエリアの情報、例えば映像に映っていないエリアで不審者によりガラスが割られた場合の情報などを検知できる可能性が高くなる。ここで、音声情報を加えた監視システムに関する技術として、特許文献１および２に記載された技術がある。 By adding audio information to the object to be monitored, there is a high possibility that information on an area that is not shown in the video, for example, information when a glass is broken by a suspicious person in an area that is not shown in the video can be detected. Here, there are techniques described in Patent Documents 1 and 2 as a technique related to a monitoring system to which voice information is added.

特許文献１には、監視エリアから得られた映像情報と音声情報とを単に監視者に提示するだけでなく、音声情報を分析することにより異常音が検出されたときに、表示する映像情報を異常音の発生場所の監視カメラで撮影された映像情報に切り替える監視モニター装置が記載されている。この監視モニター装置により、監視エリアで異常音が発生したときの映像による確認を容易に行うことができる。 In Patent Document 1, not only video information and audio information obtained from a monitoring area are presented to a monitor, but also video information to be displayed when abnormal sound is detected by analyzing the audio information. There is described a monitoring monitor device that switches to video information taken by a monitoring camera at an abnormal sound occurrence location. With this monitoring monitor device, it is possible to easily check with an image when abnormal sound occurs in the monitoring area.

しかし、この特許文献１の監視モニター装置については、監視者が常にモニター装置の前に常駐する必要があり、監視者が不在のときに発生したイベントに関する情報を残すことが困難であった。 However, with regard to the monitoring and monitoring apparatus of Patent Document 1, it is necessary for the supervisor to always reside in front of the monitoring apparatus, and it is difficult to leave information regarding events that occur when the supervisor is absent.

また、特許文献２には、映像情報の表示とともに、過去から現在に至るまでに取得した音声情報をグラフを用いてモニター上に履歴表示する監視システム装置が記載されている。この監視システム装置によれば、監視者がモニターの前に常駐することなく過去に取得した音声情報の履歴を確認することが可能になる。
特開２００４−１３９２６１号公報特開２００１−３０７２５２号公報 Patent Document 2 describes a monitoring system device that displays audio information acquired from the past to the present using a graph on a monitor together with display of video information. According to this monitoring system apparatus, it is possible for a monitor to check the history of voice information acquired in the past without being resident in front of the monitor.
JP 2004-139261 A JP 2001-307252 A

特許文献１記載の監視モニター装置によれば、モニターの表示が切り替えられたことにより監視者は何らかの異常が発生したことは知ることができるが、異常が発生してからどのくらいの時間が経過しているかを知る手段や、異常音の発生に関する映像が映っていなかった場合にどのような異常が発生したかを知る手段が提供されていないという問題があった。 According to the monitoring monitor device described in Patent Document 1, the monitor can know that some abnormality has occurred by switching the monitor display, but how long has passed since the abnormality has occurred. There is a problem that there is no means for knowing whether there is any abnormality, or no means for knowing what kind of abnormality has occurred when an image relating to the occurrence of abnormal sound is not shown.

また、特許文献２の技術では、あくまでも音声情報がどのように変化したかを示すのみであり、示された情報から異常発生に関する有意な情報を得るためには監視映像を確認する作業が別途必要であるという問題があった。 In addition, the technique of Patent Document 2 merely shows how the audio information has changed, and in order to obtain significant information regarding the occurrence of an abnormality from the displayed information, a work for confirming the monitoring video is separately required. There was a problem of being.

そこで本発明は、上記問題に鑑みなされたものであり、監視者が常駐することを必要とせず、過去に発生したイベントの有無、およびイベントが検知された際のイベント内容を容易に確認することが可能な表示情報を生成する監視情報生成装置を提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and does not require the presence of a monitor, and easily confirms the presence or absence of an event that occurred in the past and the event contents when the event is detected. An object of the present invention is to provide a monitoring information generation device that generates display information that can be displayed.

上記目的を達成するための本発明の監視情報生成装置（６０−１〜６０−３）は、監視エリアを撮影する監視カメラ（１０−１〜１０−３）と、前記監視エリアおよびその近傍の音を集音するマイク（２０−１〜２０−３）と、表示装置（７０−１〜７０−３）と、監視情報生成装置（６０−１〜６０−３）とを備えた監視システム（１）における監視情報生成装置（６０−１〜６０−３）であって、前記監視カメラ（１０−１〜１０−３）で撮影された前記監視エリアの映像の映像データと、前記マイク（２０−１〜２０−３）で集音された前記監視エリアおよびその近傍の音の音声データとを受信するデータ受信部（６０１）と、音声発生イベントに応じた典型的な音声データを所定の周波数帯域ごとの音圧レベル値で示した参照周波数特徴データと、前記音声発生イベントの内容を示すテキストデータであるイベント内容データとにより構成された参照音声データを、予め複数個記憶する参照データ記憶部（６０８）と、前記受信された音声データを取得し、この音声データの一定期間ごとの音圧レベル値を算出する音圧レベル値算出部（６０７）と、前記受信された音声データを取得し、この音声データを前記周波数帯域ごとの音圧レベル値で示した解析周波数特徴データに変換して参照データ記憶部（６０８）に記憶されている各前記参照周波数特徴データと帯域ごとに比較し、前記解析周波数特徴データに該当する前記参照周波数特徴データがある場合にこの参照周波数特徴データに対応するイベント内容データを抽出する音声解析部（６０９）と、前記算出された現在から所定期間前までの音圧レベル値を取得して時間軸上にグラフ表示するための音声情報表示データを生成し、前記音声解析部（６０９）でいずれかのイベント内容データが抽出されている場合には、抽出された時刻に該当する前記時間軸上の位置にイベント発生を示すマークを付すとともに、前記抽出されたイベント内容データを前記マークに関連付ける音声情報生成部（６１０）と、前記受信された映像データに、前記生成された音声情報表示データを重畳して表示用のディスプレイ表示データを生成するデータ重畳部（６１２）と、前記生成されたディスプレイ表示データを、前記表示装置（７０−１〜７０−３）に出力する映像出力部（６１３）とを備えたことを特徴とする。 In order to achieve the above object, the monitoring information generating device (60-1 to 60-3) of the present invention includes a monitoring camera (10-1 to 10-3) for photographing a monitoring area, and the monitoring area and its vicinity. A monitoring system including a microphone (20-1 to 20-3) for collecting sound, a display device (70-1 to 70-3), and a monitoring information generating device (60-1 to 60-3) 1), the monitoring information generation device (60-1 to 60-3), the video data of the monitoring area image captured by the monitoring camera (10-1 to 10-3), and the microphone (20). -1 to 20-3), a data receiving unit (601) for receiving the sound data of the monitoring area and the sound in the vicinity thereof, and typical sound data corresponding to the sound generation event at a predetermined frequency Reference frequency indicated by sound pressure level value for each band Reference data storage unit (608) for storing in advance a plurality of reference voice data composed of collection data and event contents data which is text data indicating the contents of the voice generation event, and the received voice data A sound pressure level value calculation unit (607) that obtains and calculates a sound pressure level value for each predetermined period of the sound data, obtains the received sound data, and uses the sound data as a sound pressure for each frequency band. The reference frequency features corresponding to the analysis frequency feature data are compared with each reference frequency feature data stored in the reference data storage unit (608) converted into analysis frequency feature data indicated by level values for each band. A voice analysis unit (609) for extracting event content data corresponding to the reference frequency feature data when there is data; Sound information level data for acquiring a sound pressure level value from a predetermined time to a predetermined period and displaying it in a graph on the time axis is generated, and any event content data is extracted by the sound analysis unit (609) In this case, a mark indicating the occurrence of an event is attached to the position on the time axis corresponding to the extracted time, and the audio information generating unit (610) for associating the extracted event content data with the mark, and the reception A data superimposing unit (612) that superimposes the generated audio information display data on the generated video data to generate display display data for display; and the generated display display data is transmitted to the display device (70-). 1 to 70-3) and a video output unit (613).

また、本発明の他の形態の監視情報生成装置（６０−１〜６０−３）は、監視エリアを撮影する監視カメラ（１０−１〜１０−３）と、前記監視エリアおよびその近傍の音を集音するマイク（２０−１〜２０−３）と、表示装置（７０−１〜７０−３）と、監視情報生成装置（６０−１〜６０−３）とを備えた監視システム（２）における監視情報生成装置（６０−１〜６０−３）であって、前記監視カメラ（１０−１〜１０−３）で撮影された前記監視エリアの映像の映像データと、前記マイク（２０−１〜２０−３）で集音された前記監視エリアおよびその近傍の音の音声データとを受信するデータ受信部（６０１）と、音声発生イベントに応じた典型的な音声データの、所定期間の音圧レベル値の変動パターンを示す参照変動パターンデータと、前記音声発生イベントの内容を示すテキストデータであるイベント内容データとにより構成された参照音声データを、予め複数個記憶する参照データ記憶部（６０８）と、前記受信された音声データを取得し、この音声データの一定期間ごとの音圧レベル値を算出する音圧レベル値算出部（６０７）と、前記算出された音圧レベル値を解析対象データとして取得し、この解析対象データと前記参照データ記憶部（６０８）に記憶されている前記各参照変動パターンデータとを所定時間幅ごとに比較し、前記解析対象データに該当する前記参照変動パターンデータがある場合にこの参照変動パターンデータに対応するイベント内容データを抽出する音声解析部（６０９）と、前記算出された現在から所定期間前までの音圧レベル値を取得して時間軸上にグラフ表示するための音声情報表示データを生成し、前記音声解析部（６０９）でいずれかのイベント内容データが抽出されている場合には、抽出された時刻に該当する前記時間軸上の位置にイベント発生を示すマークを付すとともに、前記音声解析部（６０９）で抽出されたイベント内容データを前記マークに関連付ける音声情報生成部（６１０）と、前記受信された映像データに、前記生成された音声情報表示データを重畳して表示用のディスプレイ表示データを生成するデータ重畳部（６１２）と、前記生成されたディスプレイ表示データを、前記表示装置（７０−１〜７０−３）に出力する映像出力部（６１３）とを備えたことを特徴とする。 In addition, the monitoring information generation device (60-1 to 60-3) according to another aspect of the present invention includes a monitoring camera (10-1 to 10-3) for photographing the monitoring area, and the sound in the monitoring area and the vicinity thereof. Monitoring system (2) including microphones (20-1 to 20-3) for collecting sound, display devices (70-1 to 70-3), and monitoring information generation devices (60-1 to 60-3) ) In the monitoring information generation device (60-1 to 60-3), the video data of the video of the monitoring area taken by the monitoring camera (10-1 to 10-3), and the microphone (20-). A data receiving unit (601) for receiving the sound data of the monitoring area collected in 1-20-3) and the sound in the vicinity thereof, and typical sound data corresponding to a sound generation event for a predetermined period Reference variation pattern showing variation pattern of sound pressure level value A reference data storage unit (608) for storing a plurality of reference voice data composed of data and event contents data which is text data indicating the contents of the voice generation event, and the received voice data A sound pressure level value calculation unit (607) that obtains and calculates a sound pressure level value for each predetermined period of the sound data, and obtains the calculated sound pressure level value as analysis target data. Each reference variation pattern data stored in the reference data storage unit (608) is compared for each predetermined time width, and when there is the reference variation pattern data corresponding to the analysis target data, the reference variation pattern data A sound analysis unit (609) for extracting event content data corresponding to the sound pressure level value from the present calculated time to a predetermined period before When the obtained audio information display data for graph display on the time axis is generated and any event content data is extracted by the audio analysis unit (609), it corresponds to the extracted time. An audio information generation unit (610) for attaching an event occurrence mark to the position on the time axis and associating the event content data extracted by the audio analysis unit (609) with the mark; and the received video data A data superimposing unit (612) that superimposes the generated audio information display data to generate display display data for display, and the generated display display data to the display devices (70-1 to 70-). And a video output unit (613) for outputting to 3).

本発明の監視情報生成装置によれば、監視者が常駐することを必要とせず、過去に発生した音声発生イベントの有無、および音声発生イベントが検知された際のイベント内容を容易に確認することが可能な表示情報を生成することができる。 According to the monitoring information generation device of the present invention, it is not necessary for the supervisor to be resident, and it is possible to easily confirm the presence or absence of a sound generation event that has occurred in the past and the event contents when the sound generation event is detected. The display information that can be generated can be generated.

《第１実施形態》
本発明の第１実施形態による監視情報生成装置は、監視エリアおよびその近辺の音声の周波数特性に基づいて音声を発する特異な事態（イベントまたは音声発生イベントと言う）が起こったか否かを判断するとともに、イベントが起こったと判断されたときにこのイベントの内容を特定し、これらの情報を表示装置に表示させることを特徴としている。 << First Embodiment >>
The monitoring information generation apparatus according to the first embodiment of the present invention determines whether or not a specific situation (referred to as an event or a sound generation event) in which sound is generated has occurred based on the frequency characteristics of sound in the monitoring area and the vicinity thereof. In addition, when it is determined that an event has occurred, the content of the event is specified, and the information is displayed on a display device.

〈第１実施形態による監視システムの構成〉
本実施形態である監視情報生成装置を適用した監視システムの構成を、図１を参照して説明する。 <Configuration of Monitoring System According to First Embodiment>
A configuration of a monitoring system to which the monitoring information generating apparatus according to the present embodiment is applied will be described with reference to FIG.

〈第１実施形態による監視システム〉
本実施形態による監視システム１は、複数の監視カメラ１０−１〜１０−３と、これらの監視カメラ１０−１〜１０−３にそれぞれ接続されたマイク２０−１〜２０−３と、監視カメラ１０−１〜１０−３に接続されたネットワークハブ３０と、ネットワーク４０を介してネットワークハブ３０に接続されたネットワークハブ５０と、ネットワークハブ５０に接続され、監視カメラ１０−１〜１０−３にそれぞれ対応させた複数の監視情報生成装置６０−１〜６０−３と、これらの監視情報生成装置６０−１〜６０−３にそれぞれ接続された表示装置としてのディスプレイ７０−１〜７０−３およびスピーカ８０−１〜８０−３とを備えている。 <Monitoring system according to the first embodiment>
The monitoring system 1 according to the present embodiment includes a plurality of monitoring cameras 10-1 to 10-3, microphones 20-1 to 20-3 connected to these monitoring cameras 10-1 to 10-3, and a monitoring camera. The network hub 30 connected to 10-1 to 10-3, the network hub 50 connected to the network hub 30 via the network 40, and connected to the network hub 50 to the surveillance cameras 10-1 to 10-3 A plurality of monitoring information generation devices 60-1 to 60-3 that correspond to each other, and displays 70-1 to 70-3 as display devices respectively connected to these monitoring information generation devices 60-1 to 60-3, and Speakers 80-1 to 80-3 are provided.

監視カメラ１０−１〜１０−３は、それぞれ、各監視エリアを撮影して映像データを生成した後ＪＰＥＧ等の形式で圧縮し、ネットワークハブ３０、ネットワーク４０、およびネットワークハブ５０を介して、それぞれ対応する監視情報生成装置６０−１〜６０−３に送信する。 Each of the monitoring cameras 10-1 to 10-3 shoots each monitoring area and generates video data, and then compresses the video data in a format such as JPEG. The monitoring cameras 10-1 to 10-3 are respectively connected via the network hub 30, the network 40, and the network hub 50. It transmits to the corresponding monitoring information generation apparatus 60-1 to 60-3.

また、監視カメラ１０−１〜１０−３は、それぞれ接続されたマイク２０−１〜２０−３から音声データを取得し、ＭＰ３等の形式で圧縮して映像データと同様にネットワークハブ３０、ネットワーク４０、およびネットワークハブ５０を介して、対応する監視情報生成装置６０−１〜６０−３に送信する。 In addition, the monitoring cameras 10-1 to 10-3 acquire audio data from the microphones 20-1 to 20-3 connected thereto, and compress the audio data in a format such as MP3 so that the network hub 30 and the network are similar to the video data. 40 and the corresponding network information 50 to the corresponding monitoring information generation devices 60-1 to 60-3.

マイク２０−１〜２０−３は、それぞれ、監視エリアおよびその近辺の音声を取得して音声データに変換し、接続された監視カメラ１０−１〜１０−３に送出する。 Each of the microphones 20-1 to 20-3 acquires the sound in the monitoring area and the vicinity thereof, converts the sound into sound data, and transmits the sound data to the connected monitoring cameras 10-1 to 10-3.

監視情報生成装置６０−１〜６０−３は、それぞれ図２に示す（同図においては、監視情報生成装置６０と記す）ように、データ受信部６０１と、映像復号部６０２と、映像メモリ部６０３と、音声復号部６０４と、音声メモリ部６０５と、音声出力部６０６と、音圧レベル値算出部６０７と、参照データ記憶部６０８と、音声解析部６０９と、音声情報生成部６１０と、音声情報記憶部６１１と、データ重畳部６１２と、映像出力部６１３とを有する。 As shown in FIG. 2 (referred to as the monitoring information generating device 60 in FIG. 2), each of the monitoring information generating devices 60-1 to 60-3 includes a data receiving unit 601, a video decoding unit 602, and a video memory unit. 603, a voice decoding unit 604, a voice memory unit 605, a voice output unit 606, a sound pressure level value calculation unit 607, a reference data storage unit 608, a voice analysis unit 609, a voice information generation unit 610, The audio information storage unit 611, the data superimposing unit 612, and the video output unit 613 are included.

データ受信部６０１は、監視カメラ１０−１〜１０−３から送信された、圧縮された映像データおよび音声データを受信する。映像復号部６０２は、データ受信部６０１で受信された、圧縮された映像データを伸張する。映像メモリ部６０３は、映像復号部６０２で復号された映像データを記憶する。 The data receiving unit 601 receives the compressed video data and audio data transmitted from the monitoring cameras 10-1 to 10-3. The video decoding unit 602 decompresses the compressed video data received by the data receiving unit 601. The video memory unit 603 stores the video data decoded by the video decoding unit 602.

音声復号部６０４は、データ受信部６０１で受信された、圧縮された音声データを復号する。音声メモリ部６０５は、音声復号部６０４で復号された音声データを記憶する。音声出力部６０６は、音声メモリ部６０５に記憶されたデジタル音声データを一定の周期で取得してアナログ音声データに変換し、接続されたスピーカ８０−１〜８０−３に送信する。 The audio decoding unit 604 decodes the compressed audio data received by the data reception unit 601. The audio memory unit 605 stores the audio data decoded by the audio decoding unit 604. The audio output unit 606 acquires the digital audio data stored in the audio memory unit 605 at a certain period, converts it into analog audio data, and transmits the analog audio data to the connected speakers 80-1 to 80-3.

音圧レベル値算出部６０７は、音声復号部６０４で復号された音声データを取得し、一定期間ごとの音圧レベル値を算出する。 The sound pressure level value calculation unit 607 acquires the sound data decoded by the sound decoding unit 604, and calculates a sound pressure level value for each predetermined period.

参照データ記憶部６０８には、イベントが発生したときの典型的な音声データである、ガラスが破壊されたときの音やドアが閉まるときの音などを、所定の周波数帯域ごとの音圧レベル値で示した参照周波数特徴データと、イベントの内容を示すテキストデータであるイベント内容データとから構成された参照音声データを、予め複数個記憶しておく。 In the reference data storage unit 608, a sound pressure level value for each predetermined frequency band is a typical sound data when an event occurs, such as a sound when the glass is broken or a sound when the door is closed. A plurality of reference audio data composed of the reference frequency feature data indicated by (2) and event content data which is text data indicating the content of the event are stored in advance.

音声解析部６０９は、音声復号部６０４で復号された音声データを取得し、参照周波数特徴データと同様の周波数帯域ごとの音圧レベル値で示した解析対象データである解析周波数特徴データに変換して参照データ記憶部６０８に記憶されている各参照周波数特徴データと帯域ごとに比較し、変換した解析周波数特徴データに該当する参照周波数特徴データがあればこの参照周波数特徴データに対応するイベント内容データを抽出する。音声データを解析周波数特徴データに変換する際は、ＦＦＴ（高速フーリエ変換）の手法を用いることができる。 The voice analysis unit 609 acquires the voice data decoded by the voice decoding unit 604 and converts it into analysis frequency feature data that is analysis target data indicated by sound pressure level values for each frequency band similar to the reference frequency feature data. The reference frequency feature data stored in the reference data storage unit 608 is compared for each band, and if there is reference frequency feature data corresponding to the converted analysis frequency feature data, event content data corresponding to the reference frequency feature data To extract. An FFT (Fast Fourier Transform) method can be used to convert the audio data into the analysis frequency feature data.

音声情報生成部６１０は、音圧レベル値算出部６０７で算出された過去の音圧レベル値を取得して時間軸上にグラフ表示するための音声情報表示データを生成し、音声解析部６０９でいずれかのイベント内容データが抽出されたときには、抽出された時刻に該当する時間軸上の位置にイベント発生を示すマークを付し、さらに音声解析部６０９で抽出されたイベント内容データをこのマークに関連付ける。 The voice information generation unit 610 acquires the past sound pressure level value calculated by the sound pressure level value calculation unit 607, generates voice information display data for displaying the graph on the time axis, and the voice analysis unit 609 When any event content data is extracted, a mark indicating the occurrence of the event is attached to the position on the time axis corresponding to the extracted time, and the event content data extracted by the voice analysis unit 609 is added to this mark. Associate.

音声情報記憶部６１１は、音声情報生成部６１０で生成された音声情報表示データを記憶する。 The voice information storage unit 611 stores the voice information display data generated by the voice information generation unit 610.

データ重畳部６１２は、映像メモリ部６０３に蓄積された映像データおよび音声情報記憶部６１１に記憶された音声情報表示データを取得して、取得した映像データに当該音声情報表示データを重畳して表示用のディスプレイ表示データを生成する。 The data superimposing unit 612 acquires the video data stored in the video memory unit 603 and the audio information display data stored in the audio information storage unit 611, and displays the audio information display data superimposed on the acquired video data. Display display data for use.

映像出力部６１３は、データ重畳部６１２で生成されたディスプレイ表示データをＤ／Ａ変換し、接続されたディスプレイ７０−１〜７０−３のいずれかに送信する。 The video output unit 613 performs D / A conversion on the display display data generated by the data superimposing unit 612 and transmits it to any of the connected displays 70-1 to 70-3.

ディスプレイ７０−１〜７０−３は、接続された監視情報生成装置６０−１〜６０−３から送信されたディスプレイ表示データを取得して表示する。 The displays 70-1 to 70-3 acquire and display display display data transmitted from the connected monitoring information generation devices 60-1 to 60-3.

スピーカ８０−１〜８０−３は、音声出力部６０６から送信された音声データを取得して出力する。 The speakers 80-1 to 80-3 acquire and output the audio data transmitted from the audio output unit 606.

〈第１実施形態による監視システムの動作〉
次に、本実施形態である監視システム１の動作について説明する。 <Operation of Monitoring System According to First Embodiment>
Next, the operation of the monitoring system 1 according to this embodiment will be described.

本実施形態の監視システム１の監視カメラ１０−１〜１０−３は、それぞれ、各監視エリアを撮影して映像データを生成し、さらにこの映像データをＪＰＥＧ等の形式で圧縮する。 Each of the monitoring cameras 10-1 to 10-3 of the monitoring system 1 according to the present embodiment shoots each monitoring area to generate video data, and further compresses the video data in a format such as JPEG.

また、監視カメラ１０−１〜１０−３にそれぞれ接続されたマイク２０−１〜２０−３では、それぞれ、監視エリアおよびその近辺の音声が音声データに変換され、接続された監視カメラ１０−１〜１０−３に送出される。そして、監視カメラ１０−１〜１０−３においてＭＰ３等の形式で圧縮される。 Further, in the microphones 20-1 to 20-3 connected to the monitoring cameras 10-1 to 10-3, respectively, the monitoring area and the sound in the vicinity thereof are converted into audio data, and the connected monitoring camera 10-1 is connected. To 10-3. And it compresses in formats, such as MP3, in the surveillance cameras 10-1 to 10-3.

圧縮された映像データおよび音声データは、監視カメラ１０−１〜１０−３からネットワークハブ３０、ネットワーク４０、およびネットワークハブ５０を介して、それぞれ対応する監視情報生成装置６０−１〜６０−３に送信される。 The compressed video data and audio data are sent from the monitoring cameras 10-1 to 10-3 to the corresponding monitoring information generation devices 60-1 to 60-3 via the network hub 30, the network 40, and the network hub 50, respectively. Sent.

次に、監視情報生成装置６０−１〜６０−３が、圧縮された映像データおよび音声データを受信し、接続されたディスプレイ７０−１〜７０−３に表示させるための表示データを生成する場合の処理について、図３のフローチャートを併せ参照して説明する。 Next, when the monitoring information generation devices 60-1 to 60-3 receive the compressed video data and audio data, and generate display data to be displayed on the connected displays 70-1 to 70-3. This process will be described with reference to the flowchart of FIG.

まず、監視情報生成装置６０−１〜６０−３のデータ受信部６０１が圧縮された映像データおよび音声データを受信すると（Ｓ１）、これらのうち圧縮された映像データが映像復号部６０２で伸張され、映像メモリ部６０３に記憶される。また、データ受信部６０１で受信された圧縮された音声データは、音声復号部６０４で復号され、音声メモリ部６０５に記憶される（Ｓ２）。 First, when the data receiving unit 601 of the monitoring information generating devices 60-1 to 60-3 receives the compressed video data and audio data (S1), the compressed video data is decompressed by the video decoding unit 602. Is stored in the video memory unit 603. The compressed audio data received by the data receiving unit 601 is decoded by the audio decoding unit 604 and stored in the audio memory unit 605 (S2).

音声復号部６０４で復号された音声データは音圧レベル値算出部６０７でも取得され、一定期間ごと（例えば１秒ごと）の音圧レベル値が算出される（Ｓ３）。 The sound data decoded by the sound decoding unit 604 is also acquired by the sound pressure level value calculation unit 607, and the sound pressure level value is calculated every predetermined period (for example, every second) (S3).

また、音声復号部６０４で復号された音声データは音声解析部６０９でも取得され、一定期間ごとの音声データが、参照周波数特徴データと同様の周波数帯域ごとの音圧レベル値で示した解析対象データである解析周波数特徴データに変換される。 The audio data decoded by the audio decoding unit 604 is also acquired by the audio analysis unit 609, and the analysis target data in which the audio data for each predetermined period is indicated by the sound pressure level value for each frequency band similar to the reference frequency feature data. Is converted to analysis frequency feature data.

変換された解析周波数特徴データは、音声解析部６０９において参照データ記憶部６０８に記憶されている各参照周波数特徴データと帯域ごとに比較され、当該解析周波数特徴データに該当する参照周波数特徴データがあるか否かが判定される（Ｓ４）。 The converted analysis frequency feature data is compared for each band with each reference frequency feature data stored in the reference data storage unit 608 in the voice analysis unit 609, and there is reference frequency feature data corresponding to the analysis frequency feature data. It is determined whether or not (S4).

音声解析部６０９における解析周波数特徴データと参照周波数特徴データとの比較処理について、図４のフローチャートおよび図５の周波数特徴データの例を併せ参照して説明する。 A comparison process between the analysis frequency feature data and the reference frequency feature data in the voice analysis unit 609 will be described with reference to the flowchart of FIG. 4 and the example of the frequency feature data of FIG.

参照データ記憶部６０８には複数の第n参照周波数特徴データが記憶されており、図５（ａ）においては、第１〜第３参照周波数特徴データの３つが記憶されている（n=1〜3）。 The reference data storage unit 608 stores a plurality of nth reference frequency feature data. In FIG. 5A, three of the first to third reference frequency feature data are stored (n = 1 to 3).

これらの参照周波数特徴データは分割された周波数帯域(m)ごとの音圧レベル値で示されており、図５（ａ）においては、６つの周波数帯域(1)〜(6)ごとの音圧レベル値で示されている（m=1〜6）。 These reference frequency characteristic data are indicated by sound pressure level values for each divided frequency band (m). In FIG. 5A, the sound pressure for each of the six frequency bands (1) to (6) is shown. It is indicated by a level value (m = 1 to 6).

また、図５（ｂ）は、音声解析部６０９において変換された解析周波数特徴データの例である。ここでは、解析周波数特徴データも、参照周波数特徴データと同様の周波数帯域(m)（m=1〜6）ごとの音圧レベル値で示されているものとする。 FIG. 5B is an example of analysis frequency feature data converted by the voice analysis unit 609. Here, it is assumed that the analysis frequency feature data is also represented by a sound pressure level value for each frequency band (m) (m = 1 to 6) similar to the reference frequency feature data.

まず、音声解析部６０９において解析周波数特徴データと参照周波数特徴データとの比較処理が開始されると、参照周波数特徴データの中の１つの第n参照周波数特徴データが取得される。nの初期値は１であり、まずは第１参照周波数特徴データが取得され、ｍの初期値が１となる（Ｓ１１、Ｓ１２）。 First, when comparison processing between analysis frequency feature data and reference frequency feature data is started in the voice analysis unit 609, one n-th reference frequency feature data in the reference frequency feature data is acquired. The initial value of n is 1. First, the first reference frequency feature data is acquired, and the initial value of m becomes 1 (S11, S12).

第１参照周波数特徴データが取得されると、このデータの中の１つの周波数帯域(m)について第１参照周波数特徴データと解析周波数特徴データとが比較される。mの初期値は１であり、まずは周波数帯域(1)について第１参照周波数特徴データと解析周波数特徴データとが比較される。 When the first reference frequency feature data is acquired, the first reference frequency feature data and the analysis frequency feature data are compared for one frequency band (m) in the data. The initial value of m is 1. First, the first reference frequency feature data and the analyzed frequency feature data are compared for the frequency band (1).

比較の結果、解析周波数特徴データの周波数帯域(1)の音圧レベル値が参照周波数特徴データの周波数帯域(1)の音圧レベル値の一定の可変範囲内に含まれるか否かが判定される（Ｓ１３）。 As a result of the comparison, it is determined whether or not the sound pressure level value of the frequency band (1) of the analysis frequency feature data falls within a certain variable range of the sound pressure level value of the frequency band (1) of the reference frequency feature data. (S13).

この参照周波数特徴データの周波数帯域の音圧レベル値の一定の可変範囲とは、同一のイベントによる当該周波数帯域(m)の音声データであると認めるための許容範囲であり、ここでは当該周波数帯域の音圧レベル値の±５％以内の値を可変範囲内の値として定める。 The constant variable range of the sound pressure level value in the frequency band of the reference frequency feature data is an allowable range for recognizing the sound data of the frequency band (m) due to the same event. A value within ± 5% of the sound pressure level value is determined as a value within the variable range.

図５においては、解析周波数特徴データの周波数帯域(1)の音圧レベル値は第１参照周波数特徴データの周波数帯域(1)の可変範囲内には含まれないと判定され（Ｓ１３の「NO」）、ステップＳ１４に移動する。 In FIG. 5, it is determined that the sound pressure level value in the frequency band (1) of the analysis frequency feature data is not included in the variable range of the frequency band (1) of the first reference frequency feature data (“NO” in S13). ]), And moves to step S14.

ステップＳ１４において第２参照周波数特徴データの処理に移り、ステップＳ１２に戻って第２参照周波数特徴データが取得される（Ｓ１４、Ｓ１５、Ｓ１２）。 In step S14, the process proceeds to the processing of the second reference frequency feature data, and the process returns to step S12 to acquire the second reference frequency feature data (S14, S15, S12).

第２参照周波数特徴データが取得されると、周波数帯域(m)について第２参照周波数特徴データと解析周波数特徴データとが比較される。mの初期値は１となり、まずは周波数帯域(1)について第２参照周波数特徴データと解析周波数特徴データとが比較される。 When the second reference frequency feature data is acquired, the second reference frequency feature data and the analysis frequency feature data are compared for the frequency band (m). The initial value of m is 1, and first, the second reference frequency feature data and the analysis frequency feature data are compared for the frequency band (1).

図５においては、解析周波数特徴データの周波数帯域(1)の音圧レベル値は第２参照周波数特徴データの周波数帯域(1)の可変範囲内には含まれないと判定され（Ｓ１３の「NO」）、ステップＳ１４に移動する。 In FIG. 5, it is determined that the sound pressure level value in the frequency band (1) of the analysis frequency feature data is not included in the variable range of the frequency band (1) of the second reference frequency feature data (“NO” in S13). ]), And moves to step S14.

ステップＳ１４では第３参照周波数特徴データの処理に移り、ステップＳ１２に戻って第３参照周波数特徴データが取得される（Ｓ１４、Ｓ１５、Ｓ１２）。 In step S14, it moves to the process of 3rd reference frequency feature data, returns to step S12, and 3rd reference frequency feature data is acquired (S14, S15, S12).

第３参照周波数特徴データが取得されると、周波数帯域(m)について第３参照周波数特徴データと解析周波数特徴データとが比較される。mの初期値は１となり、まずは周波数帯域(1)について第３参照周波数特徴データと解析周波数特徴データとが比較される。 When the third reference frequency feature data is acquired, the third reference frequency feature data and the analysis frequency feature data are compared for the frequency band (m). The initial value of m is 1, and first, the third reference frequency feature data and the analysis frequency feature data are compared for the frequency band (1).

図５において、解析周波数特徴データの周波数帯域(1)の音圧レベル値は第３参照周波数特徴データの周波数帯域(1)の可変範囲内には含まれると判定され（Ｓ１３の「YES」）、ステップＳ１６に移動する。 In FIG. 5, it is determined that the sound pressure level value in the frequency band (1) of the analysis frequency feature data is included in the variable range of the frequency band (1) of the third reference frequency feature data (“YES” in S13). Move to step S16.

ステップＳ１６では周波数帯域(2)の処理に移り、ステップＳ１３に戻って、解析周波数特徴データの周波数帯域(2)の音圧レベル値が第３参照周波数特徴データの周波数帯域(2)の音圧レベル値の一定の可変範囲内に含まれるか否かが判定される（Ｓ１６、Ｓ１７、Ｓ１３）。 In step S16, the process moves to the frequency band (2), and the process returns to step S13, where the sound pressure level value of the frequency band (2) of the analysis frequency feature data is the sound pressure of the frequency band (2) of the third reference frequency feature data. It is determined whether the level value falls within a certain variable range (S16, S17, S13).

図５において、解析周波数特徴データの周波数帯域(2)の音圧レベル値は第３参照周波数特徴データの周波数帯域(2)の可変範囲内には含まれると判定され（Ｓ１３の「YES」）、ステップＳ１６に移動する。 In FIG. 5, it is determined that the sound pressure level value in the frequency band (2) of the analysis frequency feature data is included in the variable range of the frequency band (2) of the third reference frequency feature data (“YES” in S13). Move to step S16.

図５においては、ステップＳ１３、Ｓ１６の処理を周波数帯域(6)（m=6）まで繰り返すことにより、全ての周波数帯域において解析周波数特徴データの音圧レベル値が第３参照周波数特徴データの音圧レベル値の可変範囲内に含まれると判定され（Ｓ１３、Ｓ１６、Ｓ１７）、この第３参照周波数特徴データは解析周波数特徴データに該当するデータであると判定される（Ｓ１８）。 In FIG. 5, by repeating the processes of steps S13 and S16 up to the frequency band (6) (m = 6), the sound pressure level value of the analysis frequency feature data is the sound of the third reference frequency feature data in all frequency bands. It is determined that the pressure level value falls within the variable range (S13, S16, S17), and the third reference frequency feature data is determined to be data corresponding to the analysis frequency feature data (S18).

また、ステップＳ１３において、解析周波数特徴データの周波数帯域(m)の音圧レベル値は第n参照周波数特徴データの周波数帯域(m)の可変範囲内には含まれないと判定され、参照データ記憶部６０８内の全ての参照周波数特徴データについての比較処理が終了したとき（Ｓ１５の「YES」）は、解析周波数特徴データに該当する参照周波数特徴データはないと判定される（Ｓ１９）。 In step S13, it is determined that the sound pressure level value of the frequency band (m) of the analysis frequency feature data is not included in the variable range of the frequency band (m) of the nth reference frequency feature data, and the reference data storage is performed. When the comparison processing for all the reference frequency feature data in the unit 608 is completed (“YES” in S15), it is determined that there is no reference frequency feature data corresponding to the analysis frequency feature data (S19).

以上で、音声解析部６０９における解析周波数特徴データと参照周波数特徴データとの比較処理についての説明を終了する。 Above, description about the comparison process of the analysis frequency feature data and reference frequency feature data in the audio | voice analysis part 609 is complete | finished.

図３の説明に戻り、ステップＳ４において解析周波数特徴データに該当する参照周波数特徴データがあると判定されたときには、この参照周波数特徴データに対応するイベント内容データが抽出される（Ｓ５）。 Returning to the description of FIG. 3, when it is determined in step S4 that there is reference frequency feature data corresponding to the analysis frequency feature data, event content data corresponding to the reference frequency feature data is extracted (S5).

ステップＳ５においてイベント内容データが抽出されたとき、またはステップＳ４において解析周波数特徴データに該当する参照周波数特徴データがないと判定されたとき（Ｓ４の「NO」）は、音声情報生成部６１０において音圧レベル値算出部６０７で算出された過去の音圧レベル値が取得され、時間軸上にグラフ表示するための音声情報表示データが生成される（Ｓ６）。ここで取得される音圧レベル値は過去数十分〜数時間程度が好ましい。 When event content data is extracted in step S5, or when it is determined in step S4 that there is no reference frequency feature data corresponding to the analysis frequency feature data (“NO” in S4), the sound information generation unit 610 generates a sound. The past sound pressure level value calculated by the pressure level value calculation unit 607 is acquired, and voice information display data for graph display on the time axis is generated (S6). The sound pressure level value acquired here is preferably about several tens of minutes to several hours in the past.

ここで、音声解析部６０９においていずれかのイベント内容データが抽出されているときには、音声情報生成部６１０により、この音声情報表示データにイベント内容データが抽出された時刻に該当するグラフデータの時間軸上の位置にイベント発生を示すマークが付され、さらに音声解析部６０９で抽出されたイベント内容データがこのマークに関連付けられる。 Here, when any event content data is extracted in the audio analysis unit 609, the time axis of the graph data corresponding to the time when the event content data is extracted from the audio information display data by the audio information generation unit 610 A mark indicating the occurrence of an event is attached to the upper position, and the event content data extracted by the voice analysis unit 609 is associated with this mark.

音声情報生成部６１０で生成された音声情報表示データは、音声情報記憶部６１１に記憶される（Ｓ７）。 The voice information display data generated by the voice information generation unit 610 is stored in the voice information storage unit 611 (S7).

次に、データ重畳部６１２において映像メモリ部６０３に蓄積された映像データおよび音声情報記憶部６１１に記憶された音声情報表示データが取得され、取得した映像データに当該音声情報表示データが重畳されて表示用のディスプレイ表示データが生成される（Ｓ８）。 Next, the video superimposing unit 612 acquires the video data stored in the video memory unit 603 and the audio information display data stored in the audio information storage unit 611, and the audio information display data is superimposed on the acquired video data. Display display data for display is generated (S8).

データ重畳部６１２で生成されたディスプレイ表示データは、映像出力部６１３においてＤ／Ａ変換されて接続されたディスプレイ７０−１〜７０−３のいずれかに送信される。 The display display data generated by the data superimposing unit 612 is D / A converted by the video output unit 613 and transmitted to any of the connected displays 70-1 to 70-3.

映像出力部６１３から送信されたディスプレイ表示データは、ディスプレイ７０−１〜７０−３でそれぞれ表示される。ディスプレイ７０−１〜７０−３に表示されたディスプレイ表示データの例を、図６に示す。 The display display data transmitted from the video output unit 613 is displayed on each of the displays 70-1 to 70-3. An example of display display data displayed on the displays 70-1 to 70-3 is shown in FIG.

図６（ａ）は監視カメラ１０−１で撮影された映像データおよびマイク２０−１で取得された音声データに基づいて監視情報生成装置６０−１で生成されたディスプレイ表示データがディスプレイ７０−１に表示された例であり、図６（ｂ）は監視カメラ１０−２で撮影された映像データおよびマイク２０−２で取得された音声データに基づいて監視情報生成装置６０−２で生成されたディスプレイ表示データがディスプレイ７０−２に表示された例であり、図６（ｃ）は監視カメラ１０−３で撮影された映像データおよびマイク２０−３で取得された音声データに基づいて監視情報生成装置６０−３で生成されたディスプレイ表示データがディスプレイ７０−３に表示された例である。 FIG. 6A shows display display data generated by the monitoring information generation device 60-1 based on the video data captured by the monitoring camera 10-1 and the audio data acquired by the microphone 20-1. FIG. 6B shows an example that is generated by the monitoring information generation device 60-2 based on video data captured by the monitoring camera 10-2 and audio data acquired by the microphone 20-2. FIG. 6C shows an example in which display display data is displayed on the display 70-2. FIG. 6C shows generation of monitoring information based on video data captured by the monitoring camera 10-3 and audio data acquired by the microphone 20-3. In this example, display display data generated by the device 60-3 is displayed on the display 70-3.

それぞれのディスプレイ表示データは、映像データ７１−１〜７１−３と、音声情報表示データ７２−１〜７２−３とから構成されている。 Each display display data is composed of video data 71-1 to 71-3 and audio information display data 72-1 to 72-3.

また、音声情報表示データ７２−１〜７２−３内には、イベントが発生したと認識された時刻位置に付せられたマーク７３−１〜７３−３が表示されるとともに、これらのマーク７３−１〜７３−３のいずれかにカーソルを近づけると、当該マークに関連付けられたイベント内容データが表示される。 In addition, in the audio information display data 72-1 to 72-3, marks 73-1 to 73-3 attached to the time positions at which the events are recognized are displayed, and these marks 73 are displayed. When the cursor is brought close to any of -1 to 73-3, event content data associated with the mark is displayed.

図６（ｂ）においては、マーク７３−２にカーソルが近づけられたことにより、マーク７３−２に関連付けられたイベント内容データである「ガラス破壊音」のテキストデータ７４−２が表示されている。 In FIG. 6B, the text data 74-2 of “glass breaking sound” which is event content data associated with the mark 73-2 is displayed when the cursor is brought close to the mark 73-2. .

また、音声メモリ部６０５に蓄積された音声データは、音声出力部６０６においてＤ／Ａ変換されて接続されたスピーカ８０−１〜８０−３に送信され、スピーカ８０−１〜８０−３から出力される（Ｓ９）。 The audio data stored in the audio memory unit 605 is D / A converted by the audio output unit 606, transmitted to the connected speakers 80-1 to 80-3, and output from the speakers 80-1 to 80-3. (S9).

以上の第１実施形態の監視システム１によれば、監視エリアおよびその近辺から取得された音声データの周波数特性に基づいてイベントの有無を検知し、さらにイベントが検知された際にこのイベントの内容を容易に確認することができるため、監視者が常駐することを必要とせずに所定期間の適切な監視を行うことができる。 According to the monitoring system 1 of the first embodiment described above, the presence / absence of an event is detected based on the frequency characteristics of audio data acquired from the monitoring area and its vicinity, and the contents of this event when an event is further detected. Therefore, it is possible to perform appropriate monitoring for a predetermined period without requiring a monitor to be resident.

この第１実施形態においては、音声解析部６０９において解析周波数特徴データと参照周波数特徴データとが比較される際、設定された全ての周波数帯域において解析周波数特徴データの音圧レベル値が参照周波数特徴データの音圧レベル値の可変範囲内に含まれると判定されたときに当該参照周波数特徴データが解析周波数特徴データに該当するデータであると判定される場合について説明したが、周波数帯域の数が多い場合（数十〜数百等）には、設定された周波数帯域のうちある程度の割合（例えば９５％）の帯域で解析周波数特徴データの音圧レベル値が参照周波数特徴データの音圧レベル値の可変範囲内に含まれれば、当該参照周波数特徴データが解析周波数特徴データに該当するデータであると判定するようにしてもよい。 In the first embodiment, when the analysis frequency feature data and the reference frequency feature data are compared in the voice analysis unit 609, the sound pressure level value of the analysis frequency feature data is the reference frequency feature in all the set frequency bands. The case where it is determined that the reference frequency feature data is data corresponding to the analysis frequency feature data when it is determined that the data is included within the variable range of the sound pressure level value of the data has been described. When there are many (several tens to several hundreds, etc.), the sound pressure level value of the analysis frequency feature data is the sound pressure level value of the reference frequency feature data in a certain proportion (for example, 95%) of the set frequency band. If it is included in the variable range, the reference frequency feature data may be determined as data corresponding to the analysis frequency feature data.

《第２実施形態》
本発明の第２実施形態による監視情報生成装置は、監視エリアおよびその近辺の音声の音声レベル変動パターンに基づいてイベントが起こったか否かを判断するとともに、イベントが起こったと判断されたときにこのイベントの内容を特定し、これらの情報を表示させることにより監視者に提供する。 << Second Embodiment >>
The monitoring information generating apparatus according to the second embodiment of the present invention determines whether or not an event has occurred based on the sound level fluctuation pattern of the sound in the monitoring area and the vicinity thereof, and when it is determined that the event has occurred, The contents of the event are specified and provided to the observer by displaying the information.

〈第２実施形態による監視システムの構成〉
本実施形態による監視システムの構成２は、第１実施形態による監視システム１の構成と同様であるため、同一の機能の構成部については詳細な説明は省略する。 <Configuration of Monitoring System According to Second Embodiment>
Since the configuration 2 of the monitoring system according to the present embodiment is the same as the configuration of the monitoring system 1 according to the first embodiment, detailed description of the components having the same functions is omitted.

本実施形態における監視情報生成装置６０−１〜６０−３の参照データ記憶部６０８は、イベントが発生したときの典型的な音声データの音圧レベル値の所定期間の変動パターンを示す参照変動パターンデータと、当該イベントの内容を示すテキストデータであるイベント内容データとからなる参照音声データを、予め複数個記憶しておく。 The reference data storage unit 608 of the monitoring information generating apparatuses 60-1 to 60-3 in the present embodiment has a reference fluctuation pattern indicating a fluctuation pattern of a sound pressure level value of typical voice data when an event occurs for a predetermined period. A plurality of reference audio data composed of data and event content data which is text data indicating the content of the event is stored in advance.

音声解析部６０９は、音圧レベル値算出部６０７で算出された音圧レベル値を解析対象データとして取得し、この解析対象データの所定期間ごとの音圧レベル値の変動データを参照データ記憶部６０８に記憶されている各参照変動パターンデータと比較し、当該解析対象データに該当する参照変動パターンデータがあればこの参照変動パターンデータに対応するイベント内容データを抽出する。 The voice analysis unit 609 acquires the sound pressure level value calculated by the sound pressure level value calculation unit 607 as analysis target data, and the fluctuation data of the sound pressure level value for each predetermined period of the analysis target data is a reference data storage unit. Each reference variation pattern data stored in 608 is compared. If there is reference variation pattern data corresponding to the analysis target data, event content data corresponding to the reference variation pattern data is extracted.

〈第２実施形態による監視システムの動作〉
次に、本実施形態による監視システム２の動作について説明する。 <Operation of Monitoring System According to Second Embodiment>
Next, the operation of the monitoring system 2 according to the present embodiment will be described.

本実施形態の監視システム２においても、第１実施形態で説明した場合と同様に、監視カメラ１０−１〜１０−３で撮影された映像の映像データおよびマイク２０−１〜２０−３で取得された監視エリアおよびその近辺の音声の音声データが圧縮され、ネットワーク４０を介して対応する監視情報生成装置６０−１〜６０−３に送信される。 Also in the monitoring system 2 of the present embodiment, the video data captured by the monitoring cameras 10-1 to 10-3 and the microphones 20-1 to 20-3 are acquired as in the case described in the first embodiment. The audio data of the monitored area and the sound in the vicinity thereof are compressed and transmitted to the corresponding monitoring information generation devices 60-1 to 60-3 via the network 40.

監視情報生成装置６０−１〜６０−３において、圧縮された映像データおよび音声データを受信し、接続されたディスプレイ７０−１〜７０−３に表示させるための表示データを生成するときの処理は、図３のフローチャートのステップＳ４における参照データ記憶部６０８に記憶されたデータを用いた比較処理を除いては第１実施形態と同様であるため、詳細な説明は省略する。 The monitoring information generating devices 60-1 to 60-3 receive compressed video data and audio data and generate display data for display on the connected displays 70-1 to 70-3. 3 is the same as that of the first embodiment except for the comparison process using the data stored in the reference data storage unit 608 in step S4 of the flowchart of FIG.

本実施形態においては、ステップＳ３において算出された音圧レベル値が解析対象データとして音声解析部６０９で取得され、この解析対象データの所定期間ごとの音圧レベル値の変動データが参照データ記憶部６０８に記憶されている各参照変動パターンデータと比較され、当該解析対象データの変動データに該当する参照変動パターンデータがあるか否かが判定される（Ｓ４）。 In the present embodiment, the sound pressure level value calculated in step S3 is acquired as analysis target data by the voice analysis unit 609, and the fluctuation data of the sound pressure level value for each predetermined period of the analysis target data is the reference data storage unit. It is compared with each reference variation pattern data stored in 608, and it is determined whether or not there is reference variation pattern data corresponding to the variation data of the analysis target data (S4).

参照変動パターンデータと解析対象データとは、参照変動パターンデータの時間幅が予め設定された数で分割されたときの単位時間(m)ごとに、図４のフローチャートで示した処理と同様の手順で平均音圧レベル値が比較されることにより、解析対象データに該当する参照変動パターンデータの有無が判定される。 The reference variation pattern data and the analysis target data are the same as the processing shown in the flowchart of FIG. 4 for each unit time (m) when the time width of the reference variation pattern data is divided by a preset number. The average sound pressure level value is compared with each other to determine whether there is reference variation pattern data corresponding to the analysis target data.

ここでも、参照変動パターンデータの分割された単位時間ごとの平均音圧レベル値は可変範囲を有しており、解析対象データの各単位時間あたりの平均音圧レベル値が、参照変動パターンデータのそれぞれの対応する単位時間あたりの平均音圧レベル値の可変範囲内に含まれていれば、同一のイベントによる音声データであると認められる。 Again, the average sound pressure level value per unit time divided in the reference variation pattern data has a variable range, and the average sound pressure level value per unit time of the analysis target data is the reference variation pattern data. If they are included in the variable range of the corresponding average sound pressure level value per unit time, it is recognized that the sound data is the same event.

以下、ステップＳ５〜Ｓ９の処理は、第１実施形態で説明した処理と同様であるため、詳細な説明は省略する。 Hereinafter, the processing in steps S5 to S9 is the same as the processing described in the first embodiment, and thus detailed description thereof is omitted.

以上の第２実施形態の監視システム２によれば、監視エリアおよびその近辺から取得された音声の変動データに基づいてイベントの有無を検知し、さらにイベントが検知された際にこのイベントの内容を容易に確認することができるため、監視者が常駐することを必要とせずに所定期間の適切な監視を行うことができる。 According to the monitoring system 2 of the second embodiment described above, the presence / absence of an event is detected based on the fluctuation data of the voice acquired from the monitoring area and its vicinity, and when the event is further detected, the content of this event is displayed. Since it can be easily confirmed, it is possible to perform appropriate monitoring for a predetermined period without requiring a monitor to be resident.

この第２実施形態においては、音声解析部６０９において解析対象データと参照変動パターンデータとが比較される際、分割された全ての単位時間において解析対象データの音圧レベル値が参照変動パターンデータの音圧レベル値の可変範囲内に含まれると判定されたときに当該参照変動パターンデータが解析対象データに該当するデータであると判定される場合について説明したが、参照変動パターンデータの時間幅の分割数が多い場合（数十〜数百等）には、分割された単位時間のうちある程度の割合（例えば９５％）で解析対象データの音圧レベル値が参照変動パターンデータの音圧レベル値の可変範囲内に含まれれば、当該参照変動パターンデータが解析対象データに該当するデータであると判定するようにしてもよい。 In the second embodiment, when the analysis target data and the reference variation pattern data are compared in the voice analysis unit 609, the sound pressure level value of the analysis target data is the reference variation pattern data in all divided unit times. The case where it is determined that the reference variation pattern data is data corresponding to the analysis target data when it is determined that the reference variation pattern data falls within the variable range of the sound pressure level value has been described. When the number of divisions is large (several tens to several hundreds, etc.), the sound pressure level value of the analysis target data is the sound pressure level value of the reference variation pattern data at a certain percentage (eg, 95%) of the divided unit time. If it is included in the variable range, the reference variation pattern data may be determined to be data corresponding to the analysis target data.

また、上記の各実施形態の監視情報生成装置の機能構成をプログラム化してコンピュータに組み込むことにより、当該コンピュータを監視情報生成装置として機能させることも可能である。 Moreover, it is also possible to make the said computer function as a monitoring information generation apparatus by programming and integrating the functional structure of the monitoring information generation apparatus of each said embodiment into a computer.

本発明の第１実施形態および第２実施形態である監視情報生成装置６０を利用した監視システム１，２の構成を示す全体図である。It is a general view which shows the structure of the monitoring systems 1 and 2 using the monitoring information generation apparatus 60 which is 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態である監視情報生成装置６０の構成を示すブロック図である。It is a block diagram which shows the structure of the monitoring information generation apparatus 60 which is 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態である監視情報生成装置６０の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the monitoring information generation apparatus 60 which is 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態である監視情報生成装置６０の音声解析部６０９の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the audio | voice analysis part 609 of the monitoring information generation apparatus 60 which is 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態である監視情報生成装置６０の参照データ記憶部６０８に記憶されている参照周波数特徴データの例（ａ）、および音声解析部６０９で変換された解析周波数特徴データの例（ｂ）である。An example (a) of reference frequency feature data stored in the reference data storage unit 608 of the monitoring information generation device 60 according to the first embodiment of the present invention, and an example of analysis frequency feature data converted by the speech analysis unit 609 (B). 本発明の第１実施形態および第２実施形態である監視情報生成装置６０に接続されたディスプレイ７０−１〜７０−３に表示された表示データの例（ａ）〜（ｃ）である。It is an example (a)-(c) of the display data displayed on the display 70-1 to 70-3 connected to the monitoring information generation apparatus 60 which is 1st Embodiment and 2nd Embodiment of this invention.

Explanation of symbols

１…監視システム
２…監視システム
１０…監視カメラ
２０…マイク
３０…ネットワークハブ
４０…ネットワーク
５０…ネットワークハブ
６０…監視情報生成装置
７０…ディスプレイ
７１…映像データ
７２…音声情報表示データ
７３…マーク
７４…テキストデータ
８０…スピーカ
６０１…データ受信部
６０２…映像復号部
６０３…映像メモリ部
６０４…音声復号部
６０５…音声メモリ部
６０６…音声出力部
６０７…音圧レベル値算出部
６０８…参照データ記憶部
６０９…音声解析部
６１０…音声情報生成部
６１１…音声情報記憶部
６１２…データ重畳部
６１３…映像出力部 DESCRIPTION OF SYMBOLS 1 ... Surveillance system 2 ... Surveillance system 10 ... Surveillance camera 20 ... Microphone 30 ... Network hub 40 ... Network 50 ... Network hub 60 ... Surveillance information generation apparatus 70 ... Display 71 ... Video data 72 ... Audio | voice information display data 73 ... Mark 74 ... Text data 80 ... Speaker 601 ... Data receiving unit 602 ... Video decoding unit 603 ... Video memory unit 604 ... Audio decoding unit 605 ... Audio memory unit 606 ... Audio output unit 607 ... Sound pressure level value calculation unit 608 ... Reference data storage unit 609 ... voice analysis part 610 ... voice information generation part 611 ... voice information storage part 612 ... data superposition part 613 ... video output part

Claims

A monitoring information generation device in a monitoring system comprising a monitoring camera that images a monitoring area, a microphone that collects sound in the monitoring area and its vicinity, a display device, and a monitoring information generation device,
A data receiving unit for receiving video data of the video of the monitoring area captured by the monitoring camera, and audio data of the monitoring area collected by the microphone and sound in the vicinity thereof;
It is composed of reference frequency characteristic data indicating typical sound data corresponding to a sound generation event with a sound pressure level value for each predetermined frequency band, and event content data which is text data indicating the content of the sound generation event. A reference data storage unit for storing a plurality of reference audio data in advance,
A sound pressure level value calculating unit for obtaining the received sound data and calculating a sound pressure level value for each predetermined period of the sound data;
Each of the reference frequency feature data stored in a reference data storage unit obtained by acquiring the received voice data, converting the voice data into analysis frequency feature data indicated by a sound pressure level value for each frequency band, and A voice analysis unit that compares each band and extracts event content data corresponding to the reference frequency feature data when there is the reference frequency feature data corresponding to the analysis frequency feature data;
The calculated sound pressure level value from the present time to a predetermined period before is acquired and voice information display data for displaying the graph on the time axis is generated, and any event content data is extracted by the voice analysis unit A mark indicating event occurrence at a position on the time axis corresponding to the extracted time, and an audio information generation unit associating the extracted event content data with the mark;
A data superimposing unit that superimposes the generated audio information display data on the received video data to generate display display data for display;
A video output unit for outputting the generated display display data to the display device;
A monitoring information generation apparatus comprising:

A monitoring information generation device in a monitoring system comprising a monitoring camera that images a monitoring area, a microphone that collects sound in the monitoring area and its vicinity, a display device, and a monitoring information generation device,
A data receiving unit for receiving video data of the video of the monitoring area captured by the monitoring camera, and audio data of the monitoring area collected by the microphone and sound in the vicinity thereof;
It is composed of reference fluctuation pattern data indicating a fluctuation pattern of a sound pressure level value for a predetermined period of typical voice data corresponding to a voice generation event, and event content data which is text data indicating the contents of the voice generation event. A reference data storage unit for storing a plurality of reference audio data in advance,
A sound pressure level value calculating unit for obtaining the received sound data and calculating a sound pressure level value for each predetermined period of the sound data;
The calculated sound pressure level value is acquired as analysis target data, the analysis target data is compared with each reference variation pattern data stored in the reference data storage unit for each predetermined time width, and the analysis target A voice analysis unit that extracts event content data corresponding to the reference variation pattern data when there is the reference variation pattern data corresponding to the data;
The calculated sound pressure level value from the present time to a predetermined period before is acquired and voice information display data for displaying the graph on the time axis is generated, and any event content data is extracted by the voice analysis unit A mark indicating the occurrence of an event at a position on the time axis corresponding to the extracted time, and an audio information generation unit associating the event content data extracted by the audio analysis unit with the mark; ,
A data superimposing unit that superimposes the generated audio information display data on the received video data to generate display display data for display;
A video output unit for outputting the generated display display data to the display device;
A monitoring information generation apparatus comprising: