JP2020064362A

JP2020064362A - Photobook production system and server device

Info

Publication number: JP2020064362A
Application number: JP2018194471A
Authority: JP
Inventors: 亜紗実横山; Asami Yokoyama; 宮本　大輔; Daisuke Miyamoto; 大輔宮本
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2020-04-23

Abstract

To provide a system and a server capable of editing or ordering a photobook by combining voice operation and touch panel operation by using a smart speaker mounted on the touch panel.SOLUTION: The photobook production system includes a sever device 1, and a smart speaker 2 having a voice output section, a sound collection section for collecting sound of user voice, and a touch panel. The sever device includes an interaction processing section which interprets user voice input via the smart speaker, and outputs a response sentence via the smart speaker, and a photobook editing section which, when the user voice is an instruction of photobook production, produces photobook data by using images received from a user terminal, and transmits a preview window of the photobook data to the smart speaker. The editing section updates the photobook data and the preview window on the basis of the voice instruction or an operation to the touch panel displaying the preview window.SELECTED DRAWING: Figure 8

Description

本発明は、フォトブック作製システム及びサーバ装置に関する。 The present invention relates to a photobook production system and a server device.

デジタルカメラやスマートフォン等で撮影した画像を、ネットワーク上にアップロードして保存することが行われている。また、アップロードした画像をプリントして、フォトブックやポスター等の印画物を作製するサービスが知られている（例えば特許文献１参照）。 Images taken by digital cameras, smartphones, etc. are uploaded and stored on a network. In addition, a service is known in which an uploaded image is printed to produce a printed matter such as a photo book or a poster (see Patent Document 1, for example).

従来、アップロードした画像を用いてフォトブックを注文する場合、パーソナルコンピュータやスマートフォンを操作して画像の選択、配置、コメントの入力等を行っていた。 Conventionally, when ordering a photo book using uploaded images, a personal computer or a smartphone is operated to select images, arrange them, input comments, and the like.

近年、対話型の音声操作に対応したＡＩアシスタントを利用可能なスマートスピーカが普及している。ユーザがスマートスピーカに話しかけることで、検索エンジンを用いた調べ物、ニュースの読み上げ、音楽や動画の再生、家電の操作など、様々なアクションを実行できる。 In recent years, smart speakers that can use an AI assistant that supports interactive voice operations have become widespread. By speaking to the smart speaker, the user can perform various actions such as research using a search engine, reading news, playing music and videos, and operating home appliances.

特開２００５−３３９２１４号公報JP, 2005-339214, A

本発明は、タッチパネル搭載のスマートスピーカを使用し、音声操作とタッチパネル操作とを組み合わせることでフォトブックの編集や注文を行うことができるフォトブック作製システム及びサーバ装置を提供することを課題とする。 An object of the present invention is to provide a photobook production system and a server device that can edit and order a photobook by using a smart speaker with a touch panel and combining voice operation and touch panel operation.

本発明によるフォトブック作製システムは、ユーザ端末から受信した画像データを保存するサーバ装置と、前記サーバ装置と通信可能に接続され、音声出力部、ユーザの発話の集音を行う集音部、及びタッチパネルを有するスマートスピーカと、を備え、前記サーバ装置は、前記スマートスピーカを介して入力されたユーザの音声を理解し、ユーザに対する応答文を生成し、前記スマートスピーカを介して前記応答文を前記ユーザへ出力する対話処理部と、前記ユーザの音声がフォトブック作製指示である場合に、前記ユーザ端末から受信した画像の中から複数の画像を選択する画像選択部と、選択された画像を用いてフォトブックデータを生成し、前記フォトブックデータのプレビュー画面を前記スマートスピーカへ送信する編集処理部と、を有し、前記編集処理部は、前記ユーザの音声による指示、又は前記プレビュー画面を表示する前記タッチパネルに対する操作に基づいて、前記フォトブックデータ及び前記プレビュー画面を更新するものである。 A photobook production system according to the present invention includes a server device that stores image data received from a user terminal, a voice output unit that is communicatively connected to the server device, a sound collection unit that collects a user's speech, and A smart speaker having a touch panel, wherein the server device understands a user's voice input via the smart speaker, generates a response sentence for the user, and outputs the response sentence via the smart speaker. An interactive processing unit for outputting to a user, an image selection unit for selecting a plurality of images from images received from the user terminal when the user's voice is a photobook production instruction, and the selected image are used. An edit processing unit for generating photobook data by transmitting the preview screen of the photobook data to the smart speaker. Has, the edit processing unit, instruction by the user voice, or based on operation on the touch panel to display the preview screen is for updating the photo book data and the preview screen.

本発明の一態様では、前記ユーザの音声が前記プレビュー画面のページ切替指示である場合、前記タッチパネルに表示された前記プレビュー画面内のページ切替ボタンが押された場合、又は前記タッチパネルに対しスワイプ操作が行われた場合、前記編集処理部は、対応するページのプレビュー画面を前記スマートスピーカへ送信する。 In one aspect of the present invention, when the user's voice is a page switching instruction of the preview screen, when a page switching button in the preview screen displayed on the touch panel is pressed, or a swipe operation is performed on the touch panel. When the above is performed, the edit processing unit transmits a preview screen of the corresponding page to the smart speaker.

本発明の一態様では、前記タッチパネルに表示された画像が所定時間以上押された場合、前記編集処理部は、保存している画像を用いた画像一覧画面を前記スマートスピーカへ送信し、押されていた画像を、前記画像一覧画面を介して選択された画像に変更する。 In one aspect of the present invention, when the image displayed on the touch panel is pressed for a predetermined time or more, the edit processing unit transmits an image list screen using the saved images to the smart speaker and is pressed. The existing image is changed to the image selected through the image list screen.

本発明の一態様では、前記タッチパネルに表示された画像がタップされた後、画像の拡大又は縮小を指示するユーザの音声を受信した場合、又は前記タッチパネルに対しピンチアウト又はピンチイン操作が行われた場合、前記編集処理部は、選択された画像の拡大又は縮小を行う。 In one aspect of the present invention, after an image displayed on the touch panel is tapped, a user's voice instructing to enlarge or reduce the image is received, or a pinch-out or pinch-in operation is performed on the touch panel. In this case, the edit processing unit enlarges or reduces the selected image.

本発明の一態様では、前記タッチパネルに表示された画像がタップされた後、画像の回転を指示するユーザの音声を受信した場合、又は前記タッチパネルに対し円弧を描く操作が行われた場合、前記編集処理部は、選択された画像の回転を行う。 In one aspect of the present invention, after the image displayed on the touch panel is tapped, when a user's voice instructing rotation of the image is received, or when an operation of drawing an arc on the touch panel is performed, The edit processing unit rotates the selected image.

本発明によるサーバ装置は、音声出力部、ユーザの発話の集音を行う集音部、及びタッチパネルを有するスマートスピーカと通信可能に接続されたサーバ装置であって、ユーザ端末から受信した画像を保存する記憶部と、前記スマートスピーカを介して入力されたユーザの音声を理解し、ユーザに対する応答文を生成し、前記スマートスピーカを介して前記応答文を前記ユーザへ出力する対話処理部と、前記ユーザの音声がフォトブック作製指示である場合に、前記ユーザ端末から受信した画像の中から複数の画像を選択する画像選択部と、選択された画像を用いてフォトブックデータを生成し、前記フォトブックデータのプレビュー画面を前記スマートスピーカへ送信する編集処理部と、を有し、前記編集処理部は、前記ユーザの音声による指示、又は前記プレビュー画面を表示する前記タッチパネルに対する操作に基づいて、前記フォトブックデータ及び前記プレビュー画面を更新するものである。 A server device according to the present invention is a server device communicatively connected to a voice output unit, a sound collection unit that collects sound of a user's speech, and a smart speaker having a touch panel, and stores an image received from a user terminal. And a dialogue processing unit that understands a user's voice input through the smart speaker, generates a response sentence for the user, and outputs the response sentence to the user through the smart speaker, When the user's voice is a photobook production instruction, an image selection unit that selects a plurality of images from the images received from the user terminal, and photobook data is generated using the selected images, and the photobook data is generated. An edit processing unit for transmitting a preview screen of book data to the smart speaker, wherein the edit processing unit is responsive to the voice of the user. Indication, or based on operation on the touch panel to display the preview screen is for updating the photo book data and the preview screen.

本発明によれば、タッチパネル搭載のスマートスピーカを使用し、音声操作とタッチパネル操作とを組み合わせることでフォトブックの編集や注文を行うことができる。 According to the present invention, it is possible to edit and order a photo book by using a smart speaker with a touch panel and combining voice operation and touch panel operation.

本発明の実施形態に係るフォトブック作製システムの概略図である。It is a schematic diagram of a photobook production system concerning an embodiment of the present invention. フォトブック作製指示の例を示す図である。It is a figure which shows the example of a photobook production instruction. フォトブックのプレビュー画面の例を示す図である。It is a figure which shows the example of the preview screen of a photo book. （ａ）は変更画像の選択方法の例を示す図であり、（ｂ）は画像一覧画面の例を示す図である。(A) is a figure which shows the example of the selection method of a change image, (b) is a figure which shows the example of an image list screen. フォトブック編集指示の例を示す図である。It is a figure which shows the example of a photobook edit instruction. フォトブック編集指示の例を示す図である。It is a figure which shows the example of a photobook edit instruction. フォトブックの最終確認画面の例を示す図である。It is a figure which shows the example of the final confirmation screen of a photo book. フォトブック作製システムのブロック構成図である。It is a block configuration diagram of a photobook production system.

以下、本発明の実施の形態を図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１に示すように、本発明の実施形態に係るフォトブック作製システムは、サーバ装置１及びスマートスピーカ２を備える。サーバ装置１は、インターネット等の通信ネットワークを介して、スマートスピーカ２及びユーザ端末３と通信可能となっている。 As shown in FIG. 1, the photobook production system according to the embodiment of the present invention includes a server device 1 and a smart speaker 2. The server device 1 can communicate with the smart speaker 2 and the user terminal 3 via a communication network such as the Internet.

ユーザ端末３は、スマートフォン、タブレット端末等である。ユーザ端末３はユーザが所有する端末である。サーバ装置１には、ユーザからアップロードされた画像データが保存されている。スマートスピーカ２は、通信機能、対話型音声操作のアシスタント機能、及びタッチパネルを有するスピーカである。 The user terminal 3 is a smartphone, a tablet terminal, or the like. The user terminal 3 is a terminal owned by the user. Image data uploaded by a user is stored in the server device 1. The smart speaker 2 is a speaker having a communication function, an interactive voice operation assistant function, and a touch panel.

スマートスピーカ２及びユーザ端末３には、サーバ装置１から、ユーザを識別する同一の識別情報（ユーザＩＤ）が付与されている。従って、サーバ装置１は、スマートスピーカ２及びユーザ端末３が、同一のユーザが使用するものであることを認識している。上述した画像のアップロードは、ユーザ端末３から行われてもよいし、同じユーザＩＤでログインした他の端末から行われてもよい。 The smart speaker 2 and the user terminal 3 are provided with the same identification information (user ID) for identifying the user from the server device 1. Therefore, the server device 1 recognizes that the smart speaker 2 and the user terminal 3 are used by the same user. The above-described image upload may be performed from the user terminal 3 or from another terminal logged in with the same user ID.

サーバ装置１は、画像解析機能を有し、ユーザからアップロードされた画像の解析を行う。画像解析は、例えば、画像内の物体やテキストの検出、公序良俗に反するおそれのある画像の検出、著作権を侵害する可能性のある画像の検出である。例えば、サーバ装置１はウェブ上で類似している画像を検索して、著作権侵害の有無を判定する。サーバ装置１は、公序良俗に反する画像や著作権を侵害する可能性のある画像については、フォトブックに使用すべきでない不適切画像と判定し、その他の画像についてはフォトブックに使用可能な画像と判定する。 The server device 1 has an image analysis function and analyzes an image uploaded by a user. The image analysis is, for example, detection of an object or text in an image, detection of an image that may violate public order and morals, or detection of an image that may infringe copyright. For example, the server device 1 searches for similar images on the web and determines whether there is copyright infringement. The server device 1 determines that an image that violates public order and morals or an image that may infringe copyrights is an inappropriate image that should not be used in the photo book, and other images as images that can be used in the photo book. judge.

本実施形態において、ユーザは、スマートスピーカ２に話しかけることで、フォトブック作製処理を開始し、スマートスピーカ２に搭載されたタッチパネル２４に表示されるプレビュー画面を確認する。また、ユーザは、タッチパネル２４を操作したり、スマートスピーカ２に話しかけたりすることで、画像の変更等のフォトブック編集処理を行う。 In the present embodiment, the user speaks to the smart speaker 2 to start the photobook production process, and confirms the preview screen displayed on the touch panel 24 mounted on the smart speaker 2. In addition, the user operates the touch panel 24 or speaks to the smart speaker 2 to perform photobook editing processing such as image change.

例えば、図２に示すように、ユーザは「ＯＫ、スピーカ」のような所定のウェイクワード（コマンドワード）でスマートスピーカ２に呼びかけ、アシスタント機能を起動させる。続いて、ユーザはフォトブックを作製するようにスマートスピーカ２に話かける。例えば「１２月の旅行の写真でフォトブック作って」と話しかける。 For example, as shown in FIG. 2, the user calls the smart speaker 2 with a predetermined wake word (command word) such as “OK, speaker” to activate the assistant function. Subsequently, the user speaks to the smart speaker 2 to make a photo book. For example, say, "Make a photo book with photos of your December trip."

スマートスピーカ２は、ウェイクワード以降のユーザの発話の音声データをサーバ装置１へ送信する。サーバ装置１は、ユーザの発話文を解釈し、フォトブックの作製処理を開始する。サーバ装置１は、このユーザによりアップロードされている画像を選択し、フォトブックのテンプレートに配置する。例えば、サーバ装置１は、保存している画像のＥＸＩＦデータを参照し、撮影日や撮影場所の情報から、１２月の旅行の写真を絞り込み、フォトブックに好適な画像を選択する。例えば、構図や明るさなどに基づいて、画像を選択する。 The smart speaker 2 transmits the voice data of the user's utterance after the wake word to the server device 1. The server device 1 interprets the utterance of the user and starts the photobook production process. The server device 1 selects the image uploaded by this user and places it in the template of the photo book. For example, the server device 1 refers to the EXIF data of the saved image, narrows down the photos of the trip in December based on the information of the shooting date and the shooting location, and selects the image suitable for the photo book. For example, an image is selected based on composition, brightness, and the like.

サーバ装置１は、選択した画像をレイアウトしてフォトブックデータを作成すると、応答文（音声データ）を生成し、フォトブックのプレビュー画面と共にスマートスピーカ２へ送信する。スマートスピーカ２は、タッチパネル２４にフォトブックのプレビュー画面を表示し、プレビュー画面を確認するように音声を出力する。例えば、スマートスピーカ２は「プレビューを表示します。確認してください。」という音声を出力する。 When the server device 1 lays out the selected image and creates the photobook data, the server device 1 generates a response sentence (voice data) and transmits it to the smart speaker 2 together with the preview screen of the photobook. The smart speaker 2 displays a preview screen of the photo book on the touch panel 24 and outputs a voice to confirm the preview screen. For example, the smart speaker 2 outputs the voice "Display preview. Please check."

例えば、スマートスピーカ２のタッチパネル２４には、図３に示すようなフォトブックのプレビュー画面が表示される。 For example, on the touch panel 24 of the smart speaker 2, a photobook preview screen as shown in FIG. 3 is displayed.

例えば、プレビュー画面には、各ページ（見開き２ページ）の画像が表示される。ページ切替ボタンＢ１、Ｂ２を押したり、指でスワイプ操作を行ったりすると、スマートスピーカ２が、サーバ装置１に対し、次ページ又は前ページの表示をリクエストする。サーバ装置１が次ページ又は前ページのプレビュー画面をスマートスピーカ２へ送信し、タッチパネル２４に表示することで、ページが切り替わる。 For example, an image of each page (two-page spread) is displayed on the preview screen. When the page switching buttons B1 and B2 are pressed or a swipe operation is performed with a finger, the smart speaker 2 requests the server device 1 to display the next page or the previous page. The page is switched by the server device 1 transmitting the preview screen of the next page or the previous page to the smart speaker 2 and displaying it on the touch panel 24.

発話により選択ページを切り替えることもできる。例えば、ユーザが「５ページ見せて」や「ページをめくって」とスマートスピーカ２に話かける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、対応するページのプレビュー画面をスマートスピーカ２へ送信する。 It is also possible to switch the selected page by utterance. For example, the user speaks to the smart speaker 2 "show page 5" or "turn pages". The server device 1 acquires the user's utterance sentence via the smart speaker 2 and interprets it, and transmits the preview screen of the corresponding page to the smart speaker 2.

フォトブックに使用する画像を変更する場合、図４（ａ）に示すように、ユーザがタッチパネル２４の変更対象画像を指で長押し選択すると、スマートスピーカ２は、サーバ装置１に対し、画像変更をリクエストする。サーバ装置１は、画像変更のリクエストを受け付けると、画像の一覧画面と応答文をスマートスピーカ２へ送信する。スマートスピーカ２は、「一覧画面から写真を選んでください」という音声を出力すると共に、図４（ｂ）に示すように、タッチパネル２４に画像一覧画面を表示する。 When changing the image used for the photo book, as shown in FIG. 4A, when the user long-presses and selects the change target image on the touch panel 24, the smart speaker 2 causes the server device 1 to change the image. Request. Upon receiving the image change request, the server device 1 transmits the image list screen and the response sentence to the smart speaker 2. The smart speaker 2 outputs the voice "Please select a picture from the list screen" and also displays the image list screen on the touch panel 24 as shown in FIG.

ユーザが、タッチパネル２４に表示された画像一覧画面から画像を選択すると、サーバ装置１は画像を変更し、フォトブックデータ及びプレビュー画面を更新する。 When the user selects an image from the image list screen displayed on the touch panel 24, the server device 1 changes the image and updates the photobook data and the preview screen.

発話により画像を変更することもできる。例えば、ユーザが「３ページの写真を変更して」とスマートスピーカ２に話かける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、画像一覧画面及び応答文をスマートスピーカ２へ送信する。 The image can be changed by utterance. For example, the user speaks to the smart speaker 2 “change the photo on page 3”. The server device 1 acquires and interprets the user's utterance sentence via the smart speaker 2, and transmits the image list screen and the response sentence to the smart speaker 2.

画像の一覧画面には、ユーザが指定したアルバムにアップロードされた全ての画像が含まれる。すなわち、サーバ装置１の画像解析により、フォトブックに使用すべきでないと判定された不適切画像も含まれる。ユーザが一覧画面から不適切画像を選択した場合、サーバ装置１は、画像の出所を質問する応答文を生成してスマートスピーカ２へ送信する。例えば、スマートスピーカ２は「６ページの写真にキャラクターコンテンツが含まれますが、ご自身で撮影された写真ですか」のような音声を出力する。 The image list screen includes all the images uploaded to the album designated by the user. That is, an inappropriate image which is determined not to be used in the photo book by the image analysis of the server device 1 is also included. When the user selects an inappropriate image from the list screen, the server device 1 generates a response sentence inquiring about the source of the image and transmits it to the smart speaker 2. For example, the smart speaker 2 outputs a voice such as "A photograph of page 6 contains character content. Is it a photograph taken by yourself?"

ユーザが「はい、そうです」のような肯定的な返答をした場合、サーバ装置１は、選択した画像の使用を承認する応答文を生成し、スマートスピーカ２へ送信する。一方、ユーザが「いいえ」のような否定的な応答をした場合、サーバ装置１はこの画像の使用には問題があることを説明する応答文を生成してスマートスピーカ２から音声を出力し、タッチパネル２４の表示を画像一覧画面に戻す。 When the user replies affirmatively, such as “yes, yes”, the server device 1 generates a response sentence approving the use of the selected image and transmits it to the smart speaker 2. On the other hand, when the user makes a negative response such as “No”, the server device 1 generates a response sentence explaining that there is a problem in using this image, and outputs a voice from the smart speaker 2. The display on the touch panel 24 is returned to the image list screen.

画像の拡大や縮小を行う場合、図５に示すように、ユーザは、タッチパネル２４の拡縮対象の画像を指でタップして選択し、「拡大（縮小）して」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、選択された画像を拡大（縮小）し、プレビュー画面及びフォトブックデータを更新する。 When enlarging or reducing an image, as shown in FIG. 5, the user taps the image to be enlarged or reduced on the touch panel 24 with a finger to select it, and speaks to the smart speaker 2 to “enlarge (reduce)”. The server device 1 acquires and interprets the user's utterance sentence via the smart speaker 2, enlarges (reduces) the selected image, and updates the preview screen and photobook data.

サーバ装置１は、ユーザが「拡大（縮小）して」と話す度に、所定の倍率で画像を拡大（縮小）する。あるいはまた、サーバ装置１は、ユーザが「拡大（縮小）して」と話してから、ユーザが「止めて」というまで、画像を連続的に拡大（縮小）してもよい。 The server device 1 enlarges (reduces) the image at a predetermined magnification each time the user says "enlarge (reduce)". Alternatively, the server device 1 may continuously enlarge (reduce) the image from when the user says "enlarge (reduce)" to when the user "stops".

ユーザが指を使い、タッチパネル２４をピンチアウト／ピンチインすることで、画像の拡大／縮小を行ってもよい。 The user may use a finger to pinch out / pinch in the touch panel 24 to enlarge / reduce the image.

画像の回転を行う場合、図６に示すように、タッチパネル２４の回転対象の画像を指でタップして選択し、「右（左）に回転して」とスマートスピーカ２に話かける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、選択された画像を回転し、プレビュー画面及びフォトブックデータを更新する。 When the image is rotated, as shown in FIG. 6, the image to be rotated on the touch panel 24 is tapped and selected with a finger, and the smart speaker 2 is told, “Rotate right (left)”. The server device 1 acquires and interprets the utterance text of the user via the smart speaker 2, rotates the selected image, and updates the preview screen and the photobook data.

サーバ装置１は、ユーザが「右（左）に回転して」と話す度に、画像を所定の角度だけ回転する。ユーザが「右に９０°回転して」と話して、回転方向や回転角度を指定してもよい。あるいはまた、サーバ装置１は、ユーザが「右（左）に回転して」と話してから、ユーザが「止めて」というまで、画像を連続的に右（左）に回転してもよい。 The server device 1 rotates the image by a predetermined angle each time the user says "rotate to the right (left)". The user may say "rotate 90 ° to the right" to specify the rotation direction and rotation angle. Alternatively, the server device 1 may continuously rotate the image to the right (left) from when the user says "rotate to the right (left)" until the user says "stop".

ユーザが指を使い、タッチパネル２４から指を離さずに円弧を描く動作をすることで、画像を任意の角度で回転させてもよい。また、円弧を描く代わりにドラッグアンドドロップ操作を行うことで、画像の移動が可能となる。 The image may be rotated at an arbitrary angle by the user using a finger and performing an operation of drawing an arc without releasing the finger from the touch panel 24. Also, the image can be moved by performing a drag and drop operation instead of drawing an arc.

サーバ装置１は、画像を選択してフォトブックデータを作成する際に、選択した画像に対し、コメント（又はタイトル）を生成して付与できる。サーバ装置１は、画像解析により検出された画像内の物体やテキスト、フォトブックの目的等に基づいて、コメントを生成する。例えば、サーバ装置１は、特徴的な物体が検出された画像に対してコメントを生成して付与する。 When selecting an image and creating photobook data, the server device 1 can generate and add a comment (or title) to the selected image. The server device 1 generates a comment based on the object or text in the image detected by the image analysis, the purpose of the photo book, or the like. For example, the server device 1 generates and adds a comment to an image in which a characteristic object is detected.

タッチパネル２４にコメントを付与したページを表示する場合、サーバ装置１は、コメントを生成・付与したこと及びコメント内容を知らせる応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は、コメントを音声で読み上げる。ユーザは、スマートスピーカ２から出力される音声を聞いて、コメントを確認する。 When displaying a page with a comment on the touch panel 24, the server device 1 generates a response sentence notifying that the comment has been generated / added and the content of the comment, and transmits it to the smart speaker 2. The smart speaker 2 reads aloud the comment. The user confirms the comment by listening to the sound output from the smart speaker 2.

コメントを変更する場合は、発話によりコメントの変更を指示する。例えば、ユーザが「コメント変更」とスマートスピーカ２に話しかけ、続いて変更後のコメントを発話する。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、現在表示している画像のコメントを変更し、フォトブックデータ及びプレビュー画面を更新する。 When changing the comment, utterance is instructed to change the comment. For example, the user speaks “change comment” to the smart speaker 2 and then utters the changed comment. The server device 1 acquires and interprets the utterance text of the user via the smart speaker 2, changes the comment of the image currently displayed, and updates the photobook data and the preview screen.

ユーザがフォトブックのプレビュー画面を一通り確認し、画像の変更等の編集処理を行った後、「注文します」等の所定のワードを発するか、又はプレビュー画面の確定ボタンＢ５（図３参照）を押すと、サーバ装置１は最終確認画面及び応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は「画面で、写真とコメントの最終確認をお願いします」等の音声を出力する。 After confirming the preview screen of the photo book, the user issues a predetermined word such as "I will place an order" after performing editing processing such as changing the image, or confirm button B5 (see Fig. 3) on the preview screen. ) Is pressed, the server device 1 generates a final confirmation screen and a response sentence and transmits the final confirmation screen and the response sentence to the smart speaker 2. The smart speaker 2 outputs a voice such as "Please confirm the final confirmation of the photo and comment on the screen."

スマートスピーカ２には、図７に示すような最終確認画面ＦＣが表示される。例えば、画面の下部には、各ページの画像選択者、コメント作成者、コメント内容を含む編集一覧が表示され、画面の上部には選択したページが表示される。編集一覧の画像選択者及びコメント作成者における“ＡＩ自動選択”、“ＡＩ自動作成”は、サーバ装置１に相当する。 A final confirmation screen FC as shown in FIG. 7 is displayed on the smart speaker 2. For example, an image list of each page, a comment creator, and an edit list including comment contents are displayed at the bottom of the screen, and the selected page is displayed at the top of the screen. “AI automatic selection” and “AI automatic creation” by the image selector and the comment creator in the edit list correspond to the server device 1.

上述したように、サーバ装置１は、画像の選択、配置、コメントの作成等を自動で行う。そのため、図７に示すように、編集一覧では、サーバ装置１が画像を選択し、かつコメントもサーバ装置１が作成したか又はコメントの無いページについては、ページ番号を強調表示する。これらのページは、ユーザが見落としている可能性があるためである。 As described above, the server device 1 automatically selects and arranges images, creates comments, and the like. Therefore, as shown in FIG. 7, in the edit list, a page number is highlighted for a page in which the server device 1 selects an image and a comment is also created by the server device 1 or has no comment. This is because the user may have overlooked these pages.

ユーザが画像を選択（変更、入れ替え）したり、コメントを作成したりしたページは、既にユーザが目を通しているページであるため、強調表示する必要はない。 The page on which the user has selected (changed or replaced) the image or created the comment does not need to be highlighted because it is the page that the user has already read.

ユーザが、強調表示されているページを確認した後、「注文します」等の所定のワードを発するか、又は最終確認画面の確定ボタンＢ７を押すと、サーバ装置１は決済画面を生成してユーザ端末３へ送信し、応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は「決済処理に進みます。ここからは端末より入力をお願いします」等の音声を出力する。決済画面では、フォトブックの配送先やクレジットカード番号の入力が必要となり、スマートスピーカ２を用いた音声入力より、ユーザ端末３の操作による手入力の方が好ましい。 When the user confirms the highlighted page and then issues a predetermined word such as “I will place an order” or presses the finalization confirmation button B7, the server device 1 generates a settlement screen. It transmits to the user terminal 3, generates a response sentence, and transmits it to the smart speaker 2. The smart speaker 2 outputs a voice such as "Proceed to payment processing. From here, please input from the terminal." On the payment screen, it is necessary to input the delivery destination of the photo book and the credit card number, and manual input by operating the user terminal 3 is preferable to voice input using the smart speaker 2.

サーバ装置１は、ユーザ（ユーザ端末３）から決済情報が入力され、フォトブックの注文を受け付けると、フォトブックデータ及び注文内容を工場５へ送信する。 When the settlement information is input from the user (user terminal 3) and the order for the photo book is accepted, the server device 1 transmits the photo book data and the order content to the factory 5.

工場５に設置されたプリンタ（図示略）は、受信したフォトブックデータに基づいて印画処理を行い、フォトブック６を作製する。工場５で作製されたフォトブック６は、ユーザへ配送される。 A printer (not shown) installed in the factory 5 performs a printing process based on the received photobook data to produce a photobook 6. The photo book 6 produced in the factory 5 is delivered to the user.

工場５へ送信されるフォトブックデータを考査端末４へ送信し、考査端末４で人手による考査を行ってもよい。これにより、サーバ装置１の画像解析で見落とした不適切画像を検出できる。 The photobook data transmitted to the factory 5 may be transmitted to the inspection terminal 4 and the inspection terminal 4 may perform the inspection manually. Thereby, an inappropriate image overlooked by the image analysis of the server device 1 can be detected.

このように、本実施形態によれば、スマートスピーカ２を介して、音声によりフォトブックの作製を指示できる。また、自動レイアウトされた画像の入れ替え等の編集処理を、スマートスピーカ２を介した音声操作及びタッチパネル操作を組み合わせて行うことができるため、ユーザは簡便にフォトブックを注文できる。 As described above, according to the present embodiment, it is possible to instruct the production of the photo book by voice through the smart speaker 2. In addition, since the editing process such as the replacement of the automatically laid out images can be performed by combining the voice operation and the touch panel operation via the smart speaker 2, the user can easily order the photo book.

サーバ装置１は、ユーザからアップロードされた画像の解析を行い、公序良俗に反するおそれがある画像や著作権を侵害するおそれがある画像等の不適切画像を予め特定している。サーバ装置１は、フォトブックの作製にあたり、不適切画像以外の、フォトブックでの使用に問題無い画像から画像を選択できる。 The server device 1 analyzes the image uploaded by the user and identifies in advance an inappropriate image such as an image that may violate public order and morals or an image that may infringe copyright. When creating a photo book, the server device 1 can select an image from images that are not inappropriate for use in the photo book, other than inappropriate images.

画像変更処理において、ユーザにより、画像一覧画面から不適切画像が選択された場合、スマートスピーカ２を介してユーザに画像の出所等を質問し、問題無いことが確認されると、フォトブックに使用する。 When an inappropriate image is selected from the image list screen by the user in the image change process, the user is queried via the smart speaker 2 about the source of the image, and if it is confirmed that there is no problem, the image is used for the photo book. To do.

図８は、フォトブック作製システムのブロック構成図である。図８に示すように、スマートスピーカ２は、制御部２０、集音部（マイク）２１、音声出力部（スピーカ）２２、通信部２３及びタッチパネル２４を有する。 FIG. 8 is a block diagram of the photobook production system. As shown in FIG. 8, the smart speaker 2 includes a control unit 20, a sound collection unit (microphone) 21, a voice output unit (speaker) 22, a communication unit 23, and a touch panel 24.

制御部２０は、音声認識の機能を有し、集音部２１を介して所定のウェイクワードが入力されると、ウェイクワード以降の音声を、通信部２３を用いてサーバ装置１へ送信する。また、制御部２０は、タッチパネル２４を介してユーザから入力された操作情報を、サーバ装置１へ送信する。 The control unit 20 has a voice recognition function, and when a predetermined wake word is input via the sound collection unit 21, the control unit 20 transmits the voice after the wake word to the server device 1 using the communication unit 23. The control unit 20 also transmits the operation information input by the user via the touch panel 24 to the server device 1.

音声出力部２２は、通信部２３を介してサーバ装置１から受信した応答文の音声データを出力する。 The voice output unit 22 outputs the voice data of the response sentence received from the server device 1 via the communication unit 23.

タッチパネル２４は、通信部２３を介してサーバ装置１から受信したプレビュー画面等の画像データを表示する。また、タッチパネル２４は、ユーザから、画像の選択や拡縮等の操作指示を受け付ける。 The touch panel 24 displays image data such as a preview screen received from the server device 1 via the communication unit 23. In addition, the touch panel 24 accepts an operation instruction such as image selection and enlargement / reduction from a user.

サーバ装置１は、対話処理部１０及びフォトブック編集部１００を備える。 The server device 1 includes a dialogue processing unit 10 and a photobook editing unit 100.

対話処理部１０は、ユーザからの音声指示を理解し、ユーザに対する適切な応答文を生成するものであり、入力理解部１１、対話管理部１２及び出力生成部１３を有する。入力理解部１１は、スマートスピーカ２から受け取ったユーザの発話文からユーザの意図（タスク）を推定する意図推定と、人名や地名等の固有名詞、日付、時間等の表現を発話文から抽出する固有表現抽出の機能を有する。 The dialogue processing unit 10 understands a voice instruction from the user and generates an appropriate response sentence for the user, and includes an input understanding unit 11, a dialogue management unit 12, and an output generation unit 13. The input understanding unit 11 estimates the intention of the user (task) from the user's utterance sentence received from the smart speaker 2 and extracts expressions such as proper nouns such as a person name and a place name, a date, and a time from the utterance sentence. It has the function of proper expression extraction.

対話管理部１２は、入力理解部１１から受け取った結果情報をデータベースに相当する内部状態に書き込んで更新する内部状態更新と、内部状態及び対話戦略（行動選択規則）に基づいて次の行動を選択する行動選択の機能を有する。 The dialogue management unit 12 writes the result information received from the input understanding unit 11 in an internal state corresponding to a database and updates the internal state, and selects the next action based on the internal state and the dialogue strategy (action selection rule). It has the function of action selection.

出力生成部１３は、対話管理部１２の行動選択が出した指示に合う応答文を生成し、スマートスピーカ２へ送信する。 The output generation unit 13 generates a response sentence that matches the instruction issued by the action selection of the dialogue management unit 12, and transmits it to the smart speaker 2.

対話処理部１０は、フォトブック編集部１００と連携し、フォトブック編集部１００の処理結果を応答文に反映させることができる。 The interaction processing unit 10 can cooperate with the photobook editing unit 100 to reflect the processing result of the photobook editing unit 100 in the response sentence.

フォトブック編集部１００は、画像ＤＢ、解析結果ＤＢ、フォトブックデータＤＢ、注文内容ＤＢ、各種プログラム等を有する記憶部１１０を備える。 The photobook editing unit 100 includes a storage unit 110 having an image DB, an analysis result DB, a photobook data DB, an order content DB, various programs, and the like.

ＣＰＵ（中央処理装置）が記憶部１１０に記憶されているプログラムを実行することで、画像受信部１０１、画像解析部１０２、画像選択部１０３、編集処理部１０４、コメント生成部１０５及び注文処理部１０６の機能が実現される。 By the CPU (Central Processing Unit) executing the program stored in the storage unit 110, the image receiving unit 101, the image analyzing unit 102, the image selecting unit 103, the editing processing unit 104, the comment generating unit 105, and the order processing unit. The functions of 106 are realized.

画像受信部１０１は、ユーザ端末３からアップロードされた画像データを受信し、画像ＤＢに格納する。 The image receiving unit 101 receives the image data uploaded from the user terminal 3 and stores it in the image DB.

画像解析部１０２は、ユーザ端末３からアップロードされた画像の解析を行う。画像解析は、例えば、画像内の物体やテキストの検出、公序良俗に反するおそれのある画像の検出、著名人や有名キャラクター等が写った著作権を侵害する可能性のある画像の検出である。画像解析部１０２は、公序良俗に反する画像や著作権を侵害する可能性のある画像については、フォトブックに使用すべきでない不適切画像と判定し、判定結果を画像ＤＢ内の画像データに紐付ける。画像解析部１０２は、画像毎に、画像から検出した物体やテキスト、画像データに含まれる撮影日情報、撮影場所情報等をタグとして解析結果ＤＢに格納する。 The image analysis unit 102 analyzes the image uploaded from the user terminal 3. The image analysis is, for example, detection of an object or text in the image, detection of an image that may violate public order and morals, or detection of an image that may infringe the copyright of a celebrity or a famous character. The image analysis unit 102 determines that an image that violates public order and morals or an image that may infringe copyrights is an inappropriate image that should not be used in the photo book, and associates the determination result with the image data in the image DB. . The image analysis unit 102 stores, for each image, an object or text detected from the image, shooting date information, shooting location information, etc. included in the image data as tags in the analysis result DB.

画像選択部１０３は、対話処理部１０がユーザの発話からフォトブックの作製というタスクを抽出すると、画像ＤＢからフォトブックに好適な画像を選択する。ユーザの発話に日時が含まれている場合は、画像データに含まれる撮影日情報を参照し、画像を選択する。 When the dialogue processing unit 10 extracts the task of creating a photobook from the user's utterance, the image selection unit 103 selects an image suitable for the photobook from the image DB. When the user's utterance includes the date and time, the image pickup date information included in the image data is referred to and the image is selected.

編集処理部１０４は、画像選択部１０３により選択された画像を所定のテンプレートに配置し、フォトブックデータを生成し、プレビュー画面をスマートスピーカ２へ送信する。生成したフォトブックデータは、フォトブックデータＤＢに格納される。 The edit processing unit 104 arranges the image selected by the image selection unit 103 in a predetermined template, generates photobook data, and transmits the preview screen to the smart speaker 2. The generated photobook data is stored in the photobook data DB.

対話処理部１０が、「５ページ見せて」「ページをめくって」「前のページに戻って」などのユーザの発話からページの入れ替えというタスクを抽出すると、編集処理部１０４は、対応するページのプレビュー画面をスマートスピーカ２へ送信する。また、スマートスピーカ２から、ページ切替ボタンＢ１、Ｂ２が押されたことや、スワイプ操作が行われたことなどの操作情報が通知されると、編集処理部１０４は、操作に対応するページのプレビュー画面をスマートスピーカ２へ送信する。 When the dialogue processing unit 10 extracts a task of page replacement from the user's utterance such as “show 5 pages”, “turn pages”, “go back to previous page”, the edit processing unit 104 causes the corresponding pages to be processed. The preview screen of is transmitted to the smart speaker 2. Further, when the smart speaker 2 notifies the operation information such as the page switch buttons B1 and B2 being pressed or the swipe operation being performed, the edit processing unit 104 causes the edit processing unit 104 to preview the page corresponding to the operation. The screen is transmitted to the smart speaker 2.

スマートスピーカ２から、タッチパネル２４上の画像が所定時間以上押されているという操作情報が通知されると、編集処理部１０４は、画像ＤＢ内の画像を用いて画像一覧画面を作成し、スマートスピーカ２へ送信する。スマートスピーカ２から画像一覧画面内の画像選択情報が通知されると、編集処理部１０４は、画像を変更してフォトブックデータ及びプレビュー画面を更新する。 When the smart speaker 2 notifies the operation information that the image on the touch panel 24 has been pressed for a predetermined time or longer, the edit processing unit 104 creates an image list screen using the images in the image DB, and the smart speaker Send to 2. When the image selection information in the image list screen is notified from the smart speaker 2, the edit processing unit 104 changes the image and updates the photobook data and the preview screen.

また、編集処理部１０４は、対話処理部１０が「３ページの写真を変更して」などのユーザの発話から画像の変更というタスクを抽出した場合も同様に、画像一覧画面をスマートスピーカ２へ送信する。 Similarly, when the interaction processing unit 10 extracts a task of changing an image from a user's utterance such as “change the photo on page 3”, the edit processing unit 104 also displays the image list screen on the smart speaker 2. Send.

画像一覧画面から不適切画像が選択された場合、不適切画像が選択されたことを対話処理部１０に通知する。対話処理部１０は、不適切画像の出所等を問う質問文を生成し、スマートスピーカ２から出力させる。 When the inappropriate image is selected from the image list screen, the dialogue processing unit 10 is notified that the inappropriate image is selected. The dialogue processing unit 10 generates a question sentence that asks about the source of the inappropriate image and outputs the question sentence from the smart speaker 2.

編集処理部１０４は、対話処理部１０がユーザの発話からコメントの変更というタスクを抽出すると、ユーザから音声入力されたコメントに変更し、フォトブックデータ及びプレビュー画面を更新する。 When the interaction processing unit 10 extracts the task of changing the comment from the utterance of the user, the editing processing unit 104 changes the comment to the voice input from the user and updates the photobook data and the preview screen.

スマートスピーカ２から、タッチパネル２４上の画像がタップされ、ピンチアウト／ピンチインされたという操作情報が通知されると、編集処理部１０４は、選択された画像の拡縮を行い、フォトブックデータ及びプレビュー画面を更新する。 When the operation information indicating that the image on the touch panel 24 is tapped and pinched out / pinched in is notified from the smart speaker 2, the edit processing unit 104 enlarges / reduces the selected image, and photobook data and a preview screen. To update.

また、編集処理部１０４は、スマートスピーカ２からタッチパネル２４上の画像がタップされたという操作情報が通知された後、対話処理部１０が「拡大（縮小）して」などのユーザの発話から画像の拡縮というタスクを抽出した場合、選択された画像の拡縮を行い、フォトブックデータ及びプレビュー画面を更新する。 Further, the edit processing unit 104 receives the operation information that the image on the touch panel 24 is tapped from the smart speaker 2, and then the dialogue processing unit 10 displays the image from the user's utterance such as “enlarge (reduce)”. When the task of scaling is selected, the selected image is scaled and the photobook data and the preview screen are updated.

スマートスピーカ２から、タッチパネル２４上の画像がタップされ、円弧を描く動作やドラッグアンドドロップされたという操作情報が通知されると、編集処理部１０４は、選択された画像の回転や移動を行い、フォトブックデータ及びプレビュー画面を更新する。 When the image on the touch panel 24 is tapped from the smart speaker 2 and the operation information that the arc is drawn or the operation is dragged and dropped is notified, the edit processing unit 104 rotates or moves the selected image, Update the photo book data and preview screen.

また、編集処理部１０４は、スマートスピーカ２からタッチパネル２４上の画像がタップされたという操作情報が通知された後、対話処理部１０が「右（左）に回転して」「上（下）に移動して」などのユーザの発話から画像の回転・移動というタスクを抽出した場合、選択された画像の回転・移動を行い、フォトブックデータ及びプレビュー画面を更新する。 In addition, in the edit processing unit 104, after the operation information that the image on the touch panel 24 is tapped is notified from the smart speaker 2, the interaction processing unit 10 “rotates right (left)” and “up (down)”. When the task of rotating / moving the image is extracted from the user's utterance such as "move to", the selected image is rotated / moved, and the photobook data and the preview screen are updated.

また、編集処理部１０４は、図７に示すような最終確認画面ＦＣを作成する。 The edit processing unit 104 also creates a final confirmation screen FC as shown in FIG. 7.

コメント生成部１０５は、フォトブックに使用している画像に対して、コメントを自動生成し、付与する。各画像には、画像解析により画像から検出された物体やテキスト、画像データに含まれる撮影日情報、撮影場所情報等がタグとして付与されている。コメント生成部１０５はこれらのタグを用いて、コメントを生成する。 The comment generation unit 105 automatically generates and adds a comment to the image used in the photo book. Each image is tagged with an object or text detected from the image by image analysis, shooting date information, shooting location information included in the image data, and the like. The comment generation unit 105 generates a comment using these tags.

注文処理部１０６は、ユーザ端末３から、フォトブックのプリント冊数等の注文を受け付ける。注文処理部１０６が決済情報の入力を受け付けて決済処理を行うと、プリント注文が完了する。注文処理部１０６はフォトブックのプリント冊数や配送先住所等を含む注文内容を注文内容ＤＢに格納する。 The order processing unit 106 receives an order from the user terminal 3, such as the number of prints of a photo book. When the order processing unit 106 receives the payment information and performs the payment processing, the print order is completed. The order processing unit 106 stores the order details including the number of prints of the photo book and the delivery address in the order details DB.

プリント注文されたフォトブックデータと、注文内容とが工場５へ送信され、フォトブック６が製造される。 The photo book data for which the print order is placed and the order details are transmitted to the factory 5, and the photo book 6 is manufactured.

このように、本実施形態によれば、スマートスピーカ２に話しかけることで、フォトブック作製処理を開始し、スマートスピーカ２を介した音声操作及びタッチパネル操作を組み合わせて、フォトブックの編集を簡便に行うことができる。 As described above, according to the present embodiment, by talking to the smart speaker 2, the photobook production process is started, and the voice operation and the touch panel operation via the smart speaker 2 are combined to easily edit the photobook. be able to.

上記実施形態において、フォトブックのプレビュー画面を、スマートスピーカ２のタッチパネル２４だけでなく、ユーザ端末３にも表示してもよい。 In the above embodiment, the preview screen of the photo book may be displayed not only on the touch panel 24 of the smart speaker 2 but also on the user terminal 3.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The present invention is not limited to the above embodiments as they are, and can be embodied by modifying the constituent elements within a range not departing from the gist of the invention in an implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, the constituent elements of different embodiments may be combined appropriately.

１サーバ装置
２スマートスピーカ
３ユーザ端末
４考査端末
５工場
６フォトブック 1 Server device 2 Smart speaker 3 User terminal 4 Examination terminal 5 Factory 6 Photobook

Claims

A server device for saving the image data received from the user terminal,
A smart speaker that is communicatively connected to the server device, has a voice output unit, a sound collection unit that collects sounds of a user's speech, and a touch panel;
Equipped with
The server device is
An interactive processing unit that understands a user's voice input through the smart speaker, generates a response sentence for the user, and outputs the response sentence to the user through the smart speaker,
An image selection unit that selects a plurality of images from the images received from the user terminal when the user's voice is a photobook production instruction;
An edit processing unit that generates photobook data using the selected image and transmits a preview screen of the photobook data to the smart speaker.
Have
The photobook production system, wherein the edit processing unit updates the photobook data and the preview screen based on an instruction by the user's voice or an operation on the touch panel that displays the preview screen.

When the user's voice is a page switching instruction of the preview screen, when a page switching button in the preview screen displayed on the touch panel is pressed, or when a swipe operation is performed on the touch panel, The photobook production system according to claim 1, wherein the edit processing unit transmits a preview screen of the corresponding page to the smart speaker.

When the image displayed on the touch panel is pressed for a predetermined time or more, the edit processing unit transmits an image list screen using the saved images to the smart speaker, and the pressed image is displayed as the image. The photobook production system according to claim 1 or 2, wherein the image is changed to a selected image via a list screen.

After the image displayed on the touch panel is tapped, when the user's voice instructing to enlarge or reduce the image is received, or when a pinch-out or pinch-in operation is performed on the touch panel, the edit processing unit The photobook production system according to claim 1, wherein the selected image is enlarged or reduced.

After the image displayed on the touch panel is tapped, when the user's voice instructing the rotation of the image is received, or when an operation of drawing an arc on the touch panel is performed, the edit processing unit is selected. The photobook production system according to claim 1, wherein the image is rotated.

A server device communicatively connected to a voice output unit, a sound collecting unit that collects sound of a user's speech, and a smart speaker having a touch panel,
A storage unit for storing the image received from the user terminal,
An interactive processing unit that understands a user's voice input through the smart speaker, generates a response sentence for the user, and outputs the response sentence to the user through the smart speaker,
An image selection unit that selects a plurality of images from the images received from the user terminal when the user's voice is a photobook production instruction;
An edit processing unit that generates photobook data using the selected image and transmits a preview screen of the photobook data to the smart speaker.
Have
The edit processing unit updates the photobook data and the preview screen based on an instruction by the user's voice or an operation on the touch panel that displays the preview screen.