CN110738821A

CN110738821A - A method and system for alarming by remote camera

Info

Publication number: CN110738821A
Application number: CN201910928048.1A
Authority: CN
Inventors: 余承富
Original assignee: SHENZHEN DANALE TECHNOLOGY Co Ltd
Current assignee: Shenzhen Haique Technology Co ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2020-01-31

Abstract

The application provides remote camera shooting warning method which comprises the steps that a camera device shoots a target person to obtain a target image and sends the target image to a cloud, the cloud receives the target image sent by the camera device to generate high-level semantic description of the target image, wherein the high-level semantic description is high-level description of the target person in a text mode, the cloud generates a warning prompt and sends the warning prompt to a terminal device under the condition that the high-level semantic description comprises preset keywords, and the terminal device receives the warning prompt sent by the cloud.

Description

A method and system for alarming by remote camera

技术领域technical field

本申请涉及监控管理技术方法领域，尤其涉及一种通过远程摄像告警方法及系统。The present application relates to the field of monitoring and management technology methods, and in particular, to a method and system for alarming through remote cameras.

背景技术Background technique

远程摄像设备在安全监控方面起到了非常重要的作用。目前，已有的远程摄像设备将拍摄的图像信息上传至云端，云端再将图像信息传输至终端设备，从而实现远程监控的目的。但是，摄像设备通常都只具有采集图像信息的功能，不能通过图像识别和语义描述的方式对图像信息进行文字描述，用户只能通过终端设备获取云端存储的图像信息主观判断图像中发生的事情，判断的准确度不高，且已有的摄像设备没有报警和远程通话功能，用户体验较差。Remote camera equipment plays a very important role in security monitoring. At present, the existing remote camera equipment uploads the captured image information to the cloud, and the cloud transmits the image information to the terminal device, so as to realize the purpose of remote monitoring. However, the camera equipment usually only has the function of collecting image information, and cannot describe the image information in text by means of image recognition and semantic description. The user can only obtain the image information stored in the cloud through the terminal device to subjectively judge what happened in the image. The accuracy of the judgment is not high, and the existing camera equipment does not have the function of alarm and remote call, and the user experience is poor.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供一种通过远程摄像告警方法及系统，用于提高用户判断摄像设备拍摄的图像文件中所发生的事件的准确度，提升用户通过终端设备与摄像设备的交互体验。Embodiments of the present application provide a remote camera alarm method and system, which are used to improve the user's accuracy in judging events that occur in image files captured by a camera device, and improve the user's interaction experience with the camera device through a terminal device.

第一方面，本申请实施例提供一种通过远程摄像告警方法，包括：In a first aspect, an embodiment of the present application provides a method for alarming through a remote camera, including:

所述摄像设备对目标人物进行拍摄得到目标图像，并向云端发送；The camera device captures the target person to obtain the target image, and sends it to the cloud;

所述云端接收所述摄像设备发送的所述目标图像，生成所述目标图像的高级语义描述，其中，所述高级语义描述为通过文字的方式对所述目标人物进行高级描述；The cloud receives the target image sent by the camera device, and generates a high-level semantic description of the target image, wherein the high-level semantic description is a high-level description of the target person by means of text;

在所述高级语义描述包括预设关键字的情况下，所述云端生成告警提示并向所述终端设备发送；In the case that the high-level semantic description includes a preset keyword, the cloud generates an alarm prompt and sends it to the terminal device;

所述终端设备接收所述云端发送的所述告警提示。The terminal device receives the alarm prompt sent by the cloud.

在一些可能的实施例中，所述云端生成所述目标图像的高级语义描述，包括：In some possible embodiments, the cloud generates a high-level semantic description of the target image, including:

所述云端从所述目标图像中提取包含目标人物在内的多个目标主体的位置向量；The cloud extracts, from the target image, position vectors of multiple target subjects including the target person;

所述云端从所述目标图像中提取包含目标人物在内的多个目标主体的姿态向量；The cloud extracts, from the target image, gesture vectors of multiple target subjects including the target person;

所述云端将所述多个目标主体的位置向量和所述多个目标主体的姿态向量输入语义描述模型，从而生成所述目标图像的高级语义描述。The cloud inputs the position vectors of the multiple target subjects and the pose vectors of the multiple target subjects into a semantic description model, thereby generating a high-level semantic description of the target image.

在一些可能的实施例中，所述语义描述模型包括低级语义单元以及高级语义单元，将所述多个目标主体的位置向量和所述多个目标主体的姿态向量输入所述语义描述模型，从而生成所述目标图像的高级语义描述，包括：In some possible embodiments, the semantic description model includes a low-level semantic unit and a high-level semantic unit, and the position vectors of the multiple target subjects and the pose vectors of the multiple target subjects are input into the semantic description model, thereby Generate a high-level semantic description of the target image, including:

将所述多个目标主体的位置向量输入所述低级语义单元从而得到低级语义描述；inputting the position vectors of the plurality of target subjects into the low-level semantic unit to obtain a low-level semantic description;

将所述多个目标主体的姿态向量和所述低级语义描述输入所述高级语义单元从而得到所述高级语义描述。The high-level semantic description is obtained by inputting the gesture vectors of the plurality of target subjects and the low-level semantic description into the high-level semantic unit.

可以看出，本申请实施例能够对目标图像中包含目标人物在内的多个目标主体的位置向量和姿态向量进行提取，并能够根据提取出的位置向量和姿态向量生成高级语义描述，从而更好地描述目标图像中多个目标主体之间的状态，得到更准确的图像语义描述。It can be seen that the embodiment of the present application can extract the position vectors and posture vectors of multiple target subjects including the target person in the target image, and can generate high-level semantic descriptions according to the extracted position vectors and posture vectors, so as to improve the performance of the target image. Describe the state between multiple target subjects in the target image well, and get a more accurate image semantic description.

在一些可能的实施例中，所述方法还包括：In some possible embodiments, the method further includes:

所述终端设备在接收到所述云端发送的告警提示的情况下，向所述摄像设备发送所述通话请求；The terminal device sends the call request to the camera device when receiving the alarm prompt sent by the cloud;

在所述终端设备通过无线局域网WLAN接入网络的情况下，所述终端设备建立与所述摄像设备的视频通话；In the case that the terminal device accesses the network through the wireless local area network (WLAN), the terminal device establishes a video call with the camera device;

在所述终端设备不是通过无线局域网WLAN接入网络的情况下，所述终端设备建立与所述摄像设备的语音通话。In the case that the terminal device does not access the network through the wireless local area network (WLAN), the terminal device establishes a voice call with the camera device.

在一些可能的实施例中，在所述终端设备建立与所述摄像设备的语音通话的情况下，所述方法还包括：In some possible embodiments, when the terminal device establishes a voice call with the camera device, the method further includes:

在所述终端设备的语音通话的界面上显示视频通话快捷键；Displaying a video call shortcut key on the interface of the voice call of the terminal device;

在所述视频通话快捷键被触发的情况下，所述终端设备停止与所述摄像设备的语音通话，并建立与所述摄像设备的视频通话。When the video call shortcut key is triggered, the terminal device stops the voice call with the camera device and establishes a video call with the camera device.

可以看出，本申请书实施例可以在终端设备的语音通话界面设置视频通话快捷键，用户可在语音通话过程中将与摄像设备的语音通话切换成视频通话，提升了用户与摄像设备的交互体验。另外，本申请实施例无需用户在终端设备安装第三方应用软件才能与摄像设备进行交互，可以防止摄像设备拍摄的图像文件被第三方软件获取，泄露用户隐私。It can be seen that, in the embodiment of the present application, a video call shortcut key can be set on the voice call interface of the terminal device, and the user can switch the voice call with the camera device to a video call during the voice call process, which improves the interaction between the user and the camera device. experience. In addition, the embodiment of the present application does not require the user to install third-party application software on the terminal device to interact with the camera device, which can prevent image files captured by the camera device from being acquired by third-party software and leak user privacy.

第二方面，本申请实施例提供一种通过远程摄像告警系统，包括：In a second aspect, an embodiment of the present application provides a remote camera warning system, including:

所述摄像设备用于对目标人物进行拍摄得到目标图像，并向云端发送；The camera device is used for photographing the target person to obtain the target image, and sending it to the cloud;

所述云端用于接收所述摄像设备发送的所述目标图像，生成所述目标图像的高级语义描述，其中，所述高级语义描述为通过文字的方式对所述目标人物进行高级描述；The cloud is configured to receive the target image sent by the camera device, and generate a high-level semantic description of the target image, where the high-level semantic description is a high-level description of the target person by means of text;

在所述高级语义描述包括预设关键字的情况下，所述云端还用于生成告警提示并向所述终端设备发送；When the high-level semantic description includes a preset keyword, the cloud is further configured to generate an alarm prompt and send it to the terminal device;

所述终端设备用于接收所述云端发送的所述告警提示。The terminal device is configured to receive the alarm prompt sent by the cloud.

在一些可能的实施例中，所述云端用于生成所述目标图像的高级语义描述，包括：In some possible embodiments, the cloud is configured to generate a high-level semantic description of the target image, including:

所述云端用于从所述目标图像中提取包含目标人物在内的多个目标主体的位置向量；The cloud is used to extract the position vectors of multiple target subjects including the target person from the target image;

所述云端用于从所述目标图像中提取包含目标人物在内的多个目标主体的姿态向量；The cloud is used for extracting gesture vectors of multiple target subjects including the target person from the target image;

所述云端用于将所述多个目标主体的位置向量和所述多个目标主体的姿态向量输入语义描述模型，从而生成所述目标图像的高级语义描述。The cloud is configured to input the position vectors of the multiple target subjects and the pose vectors of the multiple target subjects into a semantic description model, so as to generate a high-level semantic description of the target image.

在一些可能的实施例中，所述系统还包括：In some possible embodiments, the system further includes:

所述终端设备用于在接收到所述云端发送的告警提示的情况下，向所述摄像设备发送所述通话请求；The terminal device is configured to send the call request to the camera device when receiving the alarm prompt sent by the cloud;

在一些可能的实施例中，在所述终端设备建立与所述摄像设备的语音通话的情况下，所述系统还包括：In some possible embodiments, when the terminal device establishes a voice call with the camera device, the system further includes:

上述方案中，云端能够根据目标图像中包含目标人物在内的多个目标主体的位置向量和姿态向量生成目标图像的高级语义描述，若高级语义描述中包括预设关键字，云端可生成告警提示向终端设备发送，可以让用户及时获知告警事件的发生，提高了目标图像中告警事件判断的准确度，且用户可根据需要与摄像设备进行视频通话或语音通话，提升了用户与摄像设备的交互体验。In the above solution, the cloud can generate a high-level semantic description of the target image according to the position vectors and attitude vectors of multiple target subjects including the target person in the target image. If the high-level semantic description includes preset keywords, the cloud can generate an alarm prompt. Sending it to the terminal device can let the user know the occurrence of the alarm event in time, improve the accuracy of the alarm event judgment in the target image, and the user can make a video call or voice call with the camera device as needed, which improves the interaction between the user and the camera device. experience.

附图说明Description of drawings

为了更清楚地说明本申请实施例技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

图1是本申请实施例提供的一种远程摄像告警系统的结构示意图；1 is a schematic structural diagram of a remote camera alarm system provided by an embodiment of the present application;

图2是本申请实施例提供的一种远程摄像告警方法的流程示意图；2 is a schematic flowchart of a remote camera alarm method provided by an embodiment of the present application;

图3是本申请实施例提供的一种可能的用于生成高级语义描述的目标图像的示意图；3 is a schematic diagram of a possible target image for generating high-level semantic description provided by an embodiment of the present application;

图4是本申请实施例提供的一种语义描述模型的示意图；4 is a schematic diagram of a semantic description model provided by an embodiment of the present application;

图5是本申请实施例提供的一种终端设备与摄像设备视频通话界面的示意图；5 is a schematic diagram of a video call interface between a terminal device and a camera device provided by an embodiment of the present application;

图6是本申请实施例提供的一种终端设备与摄像设备语音通话界面的示意图；6 is a schematic diagram of a voice call interface between a terminal device and a camera device provided by an embodiment of the present application;

图7是本申请实施例提供的一种云端的示意图；FIG. 7 is a schematic diagram of a cloud provided by an embodiment of the present application;

图8是本申请实施例提供的一种终端设备的结构示意图；8 is a schematic structural diagram of a terminal device provided by an embodiment of the present application;

图9是本申请实施例提供的一种摄像设备的结构示意图。FIG. 9 is a schematic structural diagram of a camera device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

应当理解，当在本说明书和所附权利要求书中使用时，术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It is to be understood that, when used in this specification and the appended claims, the terms "comprising" and "comprising" indicate the presence of the described features, integers, steps, operations, elements and/or components, but do not exclude one or The presence or addition of a number of other features, integers, steps, operations, elements, components, and/or sets thereof.

还应当理解，在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样，除非上下文清楚地指明其它情况，否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the specification of the application herein is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural unless the context clearly dictates otherwise.

还应当进一步理解，在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It should also be further understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items .

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

本申请实施例的通过远程摄像告警方法及系统采用云端对摄像设备实时拍摄的图像进行图像识别并生成高级语义描述，以判断摄像设备的监控区域是否发生了告警事件，发送告警提示至终端设备，以便用户及时获知告警事件的发生，尽快采取相应措施。本申请实施例提供的通过远程摄像告警方法及系统不仅可以应用于区域监控，如住宅小区监控、办公楼、银行、商场等传统地监控，还能应用于远程儿童及老人看护、医院病人看护等，此处不作具体限定。The remote camera alarm method and system according to the embodiments of the present application use the cloud to perform image recognition on images captured by the camera device in real time and generate a high-level semantic description, so as to determine whether an alarm event has occurred in the monitoring area of the camera device, and send an alarm prompt to the terminal device. This enables users to be informed of the occurrence of an alarm event in time and to take corresponding measures as soon as possible. The remote camera alarming method and system provided by the embodiments of the present application can not only be applied to regional monitoring, such as residential area monitoring, office buildings, banks, shopping malls and other traditional monitoring, but also be applied to remote child and elderly care, hospital patient care, etc. , which is not specifically limited here.

首先，请参阅图1，图1是本申请实施例提供的一种通过远程摄像告警系统的结构示意图。First, please refer to FIG. 1 . FIG. 1 is a schematic structural diagram of a remote camera warning system provided by an embodiment of the present application.

如图1所示，本申请实施例的远程摄像告警系统包括：云端110、终端设备120和摄像设备130。As shown in FIG. 1 , the remote camera alarm system according to the embodiment of the present application includes: a cloud 110 , a terminal device 120 , and a camera device 130 .

在本申请具体的实施例中，上述云端110可用于建立终端设备120和摄像设备130之间的通信连接，如终端设备120可以发送通话请求至云端110，云端110转发通话请求至摄像设备130，摄像设备130在接收到通话请求后，可以与终端设备120进行通话，该通话可以为语音通话，也可以为视频通话，此处不作具体限定。可以理解，终端设备120与摄像设备130的通信连接也可以不通过云端110建立，此处不作具体限定。其中，通信连接可以通过无线方式实现，如3G、4G或无线局域网(wireless local area network,WLAN)等。In a specific embodiment of the present application, the above-mentioned cloud 110 can be used to establish a communication connection between the terminal device 120 and the camera device 130. For example, the terminal device 120 can send a call request to the cloud 110, and the cloud 110 forwards the call request to the camera device 130. After receiving the call request, the camera device 130 may make a call with the terminal device 120, and the call may be a voice call or a video call, which is not specifically limited here. It can be understood that the communication connection between the terminal device 120 and the camera device 130 may not be established through the cloud 110 , which is not specifically limited here. The communication connection may be implemented in a wireless manner, such as 3G, 4G, or a wireless local area network (wireless local area network, WLAN).

在本申请具体的实施例中，以云端110用于建立终端设备120和摄像设备130之间的通信连接为例，当云端110建立终端设备120与摄像设备130之间的第一次通信连接时，需要识别用于表示终端设备120的第一标识信息和用于表示摄像设备130的第二标识信息，云端110通过第一标识信息可以识别到终端设备120，并使终端设备120加入云端110，云端110通过第二标识信息可以识别到摄像设备130，并使摄像设备130加入云端110。云端110还可以存储上述第一标识信息、第二标识信息及第一标识信息和第二标识信息之间的绑定关系。In the specific embodiment of this application, taking the cloud 110 for establishing the communication connection between the terminal device 120 and the camera device 130 as an example, when the cloud 110 establishes the first communication connection between the terminal device 120 and the camera device 130 , it is necessary to identify the first identification information used to represent the terminal device 120 and the second identification information used to represent the camera device 130, the cloud 110 can identify the terminal device 120 through the first identification information, and make the terminal device 120 join the cloud 110, The cloud 110 can identify the camera device 130 through the second identification information, and make the camera device 130 join the cloud 110 . The cloud 110 may also store the first identification information, the second identification information, and the binding relationship between the first identification information and the second identification information.

通过上述步骤建立终端设备120与摄像设备130之间的通信连接后，终端设备120与摄像设备130之间可以不需要重新建立通信连接，但可以根据需要对第一标识信息、第二标识信息及第一标识信息和第二标识信息之间的绑定关系进行更新和删除等。After the communication connection between the terminal device 120 and the camera device 130 is established through the above steps, there is no need to re-establish the communication connection between the terminal device 120 and the camera device 130, but the first identification information, the second identification information and the The binding relationship between the first identification information and the second identification information is updated or deleted.

由此可见，云端110根据终端设备120的第一标识信息和摄像设备130的第二标识信息之间的绑定关系建立二者之间的通信连接，可以避免与摄像设备130没有绑定关系的终端设备120在获取到摄像设备130的第二标识信息后，与摄像设备130建立通信连接，还可以避免与终端设备120没有绑定关系的摄像设备130在获取到终端设备120的第一标识信息后，与终端设备120建立通信连接。It can be seen from this that the cloud 110 establishes a communication connection between the first identification information of the terminal device 120 and the second identification information of the camera device 130 according to the binding relationship between the two, which can avoid the unbound relationship with the camera device 130. After the terminal device 120 acquires the second identification information of the camera device 130, it establishes a communication connection with the camera device 130, which can also prevent the camera device 130 that is not bound to the terminal device 120 from acquiring the first identification information of the terminal device 120. Afterwards, a communication connection is established with the terminal device 120 .

在本申请具体的实施例中，云端110还可包括云存储空间和云存储空间，当云端110第一次存储摄像设备130上传的图像文件时，可以根据摄像设备130的第二标识信息创建对应的云存储子空间；当云端110第一次向终端设备120传输图像文件时，可以根据终端设备120的第二标识信息创建对应的云存储空间。云存储子空间用于存储单个摄像设备130拍摄的图像文件，云存储空间用于存储与其有绑定关系的一个或者多个云存储子空间。云存储空间是根据终端设备120的第一标识信息创建的，也就是说，云存储空间与第一标识信息是一一对应的关系，云存储子空间是根据摄像设备130的第二标识信息创建的，同理，云存储子空间与第二标识信息是一一对应的关系。可以理解，若第一标识信息与第二标识信息之间有绑定关系，云存储空间与云存储子空间之间具有对应的绑定关系。In a specific embodiment of the present application, the cloud 110 may further include a cloud storage space and a cloud storage space. When the cloud 110 stores an image file uploaded by the camera device 130 for the first time, it can create a corresponding image file according to the second identification information of the camera device 130 for the first time. When the cloud 110 transmits an image file to the terminal device 120 for the first time, a corresponding cloud storage space can be created according to the second identification information of the terminal device 120 . The cloud storage subspace is used to store image files captured by a single camera device 130, and the cloud storage space is used to store one or more cloud storage subspaces that are bound to it. The cloud storage space is created according to the first identification information of the terminal device 120 , that is, the cloud storage space has a one-to-one correspondence with the first identification information, and the cloud storage subspace is created according to the second identification information of the camera device 130 Yes, in the same way, there is a one-to-one correspondence between the cloud storage subspace and the second identification information. It can be understood that if there is a binding relationship between the first identification information and the second identification information, there is a corresponding binding relationship between the cloud storage space and the cloud storage subspace.

通过上述步骤建立终端设备120对应的云存储空间和摄像设备130对应的云存储子空间后，可以不需要重新建立云存储空间和云存储子空间，但可以根据需要对云存储空间和云存储子空间进行更新和删除等，还可以根据需要对云存储空间和云存储子空间上存储的图像文件进行更新和删除等。After the cloud storage space corresponding to the terminal device 120 and the cloud storage subspace corresponding to the camera device 130 are established through the above steps, it is not necessary to re-establish the cloud storage space and the cloud storage subspace. You can also update and delete the image files stored in the cloud storage space and the cloud storage subspace as needed.

由此可见，摄像设备130将拍摄的目标图像发送至云端110进行存储，无需用户使用终端设备120进行存储，减少终端设备120内存的占用。且云端110根据终端设备120的第一标识信息、摄像设备130的第二标识信息以及第一标识信息和第二标识信息之间的绑定关系建立云存储空间和云存储子空间，可以避免与摄像设备130没有绑定关系的终端设备120在获取到摄像设备130的第二标识信息后，查看摄像设备130对应的云存储子空间中的图像文件，还可以避免与终端设备120没有绑定关系的摄像设备130在获取到终端设备120的第一标识信息后，将拍摄的图像文件上传至终端设备120对应的云存储空间。It can be seen that, the camera device 130 sends the captured target image to the cloud 110 for storage, without the need for the user to use the terminal device 120 for storage, thereby reducing the memory occupation of the terminal device 120 . And the cloud 110 establishes the cloud storage space and the cloud storage subspace according to the first identification information of the terminal device 120, the second identification information of the camera device 130, and the binding relationship between the first identification information and the second identification information, which can avoid the After acquiring the second identification information of the camera device 130, the terminal device 120 whose camera device 130 has no binding relationship can view the image files in the cloud storage subspace corresponding to the camera device 130, so as to avoid no binding relationship with the terminal device 120. After acquiring the first identification information of the terminal device 120, the camera device 130 uploads the captured image file to the cloud storage space corresponding to the terminal device 120.

举例来说，表示终端设备120的第一标识信息可以为拨号号码或者ID等，也可以为用户自定义的终端设备120名称，如“终端1”、“终端2”等，表示摄像设备130的标识信息可以为拨号号码或者IP地址等，也可以为用户自定义的摄像设备130名称，如“摄像机1”、“摄像机2”等。For example, the first identification information representing the terminal device 120 may be a dial-up number or ID, etc., or may be a user-defined name of the terminal device 120, such as "terminal 1", "terminal 2", etc. The identification information may be a dial-up number or an IP address, etc., or may be a user-defined name of the camera device 130, such as "camera 1", "camera 2", and so on.

在本申请具体的实施例中，云端110接收到摄像设备130上传的图像文件后，可以每隔预设时间截取图像文件中的单帧或多帧目标图像进行图像识别，云端110提取目标图像中多个目标主体的位置向量(即位置特征)和姿态向量(即姿态特征)后，将该多个目标主体的位置向量输入语义描述模型中的低级语义单元可以得到低级语义描述，将该多个目标主体的姿态向量和上述低级语义描述输入语义描述模型中的高级语义单元可以得到高级语义描述。若该高级语义描述中包括预设关键字，如“摔倒”、“跌倒”、“挣扎”、“哭泣”、“抽搐”、“血”等关键字，云端110确定摄像设备130监控区域发生了告警事件，则生成告警提示并发送至终端设备120。其中，上述低级语义描述为通过文字的方式对目标人物进行低级描述，高级语义描述为通过文字的方式对所述目标人物进行高级描述，预设时间可以为0.001秒，0.03秒，0.5秒，1秒等，云端110接收的摄像设备130上传的图像文件可以是图片信息(如采集时刻连续的多张图片)、或是视频信息(如一定时长的视频，如10s长的视频)等等，若云端110截取的是多帧目标图像，相邻两帧目标图像之间的时间间隔可以是相等的，也可以是不相等的，此处不作具体限定。In the specific embodiment of the present application, after the cloud 110 receives the image file uploaded by the camera device 130, it can intercept a single frame or multiple frames of target images in the image file at preset time intervals for image recognition, and the cloud 110 extracts the target image in the target image. After the position vectors (ie, position features) and attitude vectors (ie, pose features) of multiple target subjects, input the position vectors of the multiple target subjects into the low-level semantic units in the semantic description model to obtain low-level semantic descriptions. The high-level semantic description can be obtained by the high-level semantic unit in the input semantic description model of the pose vector of the target subject and the above-mentioned low-level semantic description. If the high-level semantic description includes preset keywords, such as keywords such as "fall", "fall", "struggle", "crying", "twitch", "blood", etc., the cloud 110 determines that the occurrence of the occurrence in the monitoring area of the camera device 130 If an alarm event is detected, an alarm prompt is generated and sent to the terminal device 120 . The above-mentioned low-level semantic description refers to a low-level description of the target person through text, and the high-level semantic description refers to a high-level description of the target person through text. The preset time can be 0.001 seconds, 0.03 seconds, 0.5 seconds, 1 seconds, etc., the image file uploaded by the camera device 130 received by the cloud 110 may be picture information (such as multiple pictures consecutively at the acquisition time), or video information (such as a video of a certain duration, such as a video of 10s), etc. If The cloud 110 intercepts multiple frames of target images, and the time interval between two adjacent frames of target images may be equal or unequal, which is not specifically limited here.

由此可见，云端110每隔预设时间截取图像文件中的目标图像进行图像识别并生成语义描述，在确定发生了告警事件时发送告警提示至终端设备120，不仅提高了目标图像中告警事件判断的准确度，还可以在无需用户实时查看监控区域的情况下，使用户及时获知发生了告警事件，尽快采取相应措施。It can be seen that the cloud 110 intercepts the target image in the image file every preset time to perform image recognition and generate a semantic description, and sends an alarm prompt to the terminal device 120 when it is determined that an alarm event has occurred, which not only improves the judgment of the alarm event in the target image It can also make users know that an alarm event has occurred in time without the need for the user to view the monitoring area in real time, and take corresponding measures as soon as possible.

在本申请具体的实施例中，上述终端设备120在接收到云端110发送的告警提示后，终端设备120可以向摄像设备130发送通话请求。在终端设备120通过无线局域网WLAN接入网络的情况下，该终端设备120建立与摄像设备130的视频通话(即调起摄像设备130视频画面)。在终端设备120不是通过无线局域网WLAN接入网络的情况下，该终端设备120建立与摄像设备130的语音通话，其中，在该终端设备120的语音通话的界面上设置有视频通话快捷键，若用户在与摄像设备130语音通话的过程中点击该视频通话快捷键，则终止与摄像设备130的语音通话，建立与摄像设备130的视频通话。In the specific embodiment of the present application, after the above-mentioned terminal device 120 receives the alarm prompt sent by the cloud 110 , the terminal device 120 may send a call request to the camera device 130 . When the terminal device 120 accesses the network through the wireless local area network WLAN, the terminal device 120 establishes a video call with the camera device 130 (ie, calls up the video screen of the camera device 130). When the terminal device 120 does not access the network through the wireless local area network (WLAN), the terminal device 120 establishes a voice call with the camera device 130, wherein a video call shortcut key is set on the voice call interface of the terminal device 120. When the user clicks the video call shortcut key during the voice call with the camera device 130 , the voice call with the camera device 130 is terminated, and a video call with the camera device 130 is established.

如，在终端设备120通过WLAN连接网络的情况下，终端设备120接收到云端110发送的告警提示，该终端设备120的通话界面可以显示“开始”快捷键或“结束”快捷键等，用户点击“开始”，便可建立与摄像设备130的视频通话查看摄像设备130监控区域的画面，用户点击“结束”，便可终止与摄像设备130的视频通话。在终端设备120不是通过WLAN连接网络的情况下，该终端设备120的通话界面上可以设置有“开始”快捷键、“视频通话”快捷键、内置应用的图标(如相机或相册)或者“结束”快捷键等，用户点击“开始”，便可建立与摄像设备130的语音通话向摄像设备130发送语音消息；用户点击“视频通话”快捷键，便可终止与摄像设备130的语音通话，建立与摄像设备130的视频通话查看摄像设备130监控区域的画面；用户点击内置应用的图标(如相机或相册)，便可进入该内置应用内部看到该终端设备120绑定的所有摄像设备130的图标，可与任一摄像设备130建立视频通话或者语音通话，还可以在视频通话或者语音通话过程中切换摄像设备130；用户点击“结束”，便可终止与摄像设备130的语音通话。For example, when the terminal device 120 is connected to the network through the WLAN, the terminal device 120 receives the alarm prompt sent by the cloud 110, and the call interface of the terminal device 120 can display the "start" shortcut key or the "end" shortcut key, etc., and the user clicks "Start", the video call with the camera device 130 can be established to view the screen of the monitoring area of the camera device 130, and the user can click "End" to terminate the video call with the camera device 130. In the case where the terminal device 120 is not connected to the network through WLAN, the call interface of the terminal device 120 may be provided with a "start" shortcut key, a "video call" shortcut key, an icon of a built-in application (such as a camera or photo album), or a "end" shortcut key. ” shortcut key, etc., the user clicks “Start” to establish a voice call with the camera device 130 and send a voice message to the camera device 130; the user clicks the “Video call” shortcut key to terminate the voice call with the camera device 130, establish Make a video call with the camera device 130 to view the screen of the monitoring area of the camera device 130; the user clicks the icon of the built-in application (such as a camera or photo album), and can enter the built-in application to see the information of all the camera devices 130 bound to the terminal device 120. icon, a video call or a voice call can be established with any camera device 130, and the camera device 130 can be switched during the video call or voice call; the user can click "End" to terminate the voice call with the camera device 130.

在本申请具体的实施例中，终端设备120还可以查看云端110的云存储空间和云存储子空间中的图像文件并根据需要进行下载、更新或删除等，其中，云存储空间是根据第一标识信息创建的，云存储子空间是根据第二标识信息创建的，第一标识信息表示终端设备120，第二标识信息表示摄像设备130，云存储空间与云存储子空间具有绑定关系，该绑定关系可以根据第一标识信息和第二标识信息之间的绑定关系建立。In a specific embodiment of the present application, the terminal device 120 can also view the cloud storage space of the cloud 110 and the image files in the cloud storage subspace and download, update or delete as needed, wherein the cloud storage space is based on the first The identification information is created, and the cloud storage subspace is created according to the second identification information. The first identification information represents the terminal device 120, the second identification information indicates the camera device 130, and the cloud storage space and the cloud storage subspace have a binding relationship. The binding relationship may be established according to the binding relationship between the first identification information and the second identification information.

由此可见，本申请实施例中，终端设备120接收云端110发送的告警提示，用户可根据需要与摄像设备130进行视频通话或语音通话，提升了用户与摄像设备130的交互体验，且该终端设备120上设置有预设应用图标的快捷键和“开始”快捷键等，无需用户在终端设备120安装第三方应用软件才能与摄像设备130进行交互，可以防止用户安装的摄像设备130拍摄的图像文件被第三方软件获取，泄露用户隐私。It can be seen that, in the embodiment of the present application, the terminal device 120 receives the alarm prompt sent by the cloud 110, and the user can make a video call or a voice call with the camera device 130 as needed, which improves the interaction experience between the user and the camera device 130, and the terminal The device 120 is provided with a preset application icon shortcut key and a “start” shortcut key, etc., the user does not need to install third-party application software on the terminal device 120 to interact with the camera device 130, which can prevent the image captured by the camera device 130 installed by the user. Files are obtained by third-party software, revealing user privacy.

在本申请具体的实施例中，上述摄像设备130可旋转，可以实时采集其监控区域的图像信息进行拍摄得到图像文件，该图像文件可以是图片信息(如采集时刻连续的多张图片)、或是视频信息(如一定时长的视频，如10s长的视频)等等，以达到监控某个区域的目的。该摄像设备130可以每隔预设时间将拍摄的图像文件上传至云端110。另外，该摄像设备130在接收到终端设备120发送的通话请求后，可建立与终端设备120的视频通话或者语音通话。本申请中，摄像设备130可为摄像机或者摄像头等，摄像设备130可设置于家中、楼道中、居住小区的建筑物上、街道上等等。In a specific embodiment of the present application, the above-mentioned camera device 130 is rotatable, and can collect image information of its monitoring area in real time to obtain an image file. It is video information (such as a video of a certain length, such as a 10s-long video), etc., to achieve the purpose of monitoring a certain area. The camera device 130 may upload the captured image files to the cloud 110 at preset time intervals. In addition, the camera device 130 can establish a video call or a voice call with the terminal device 120 after receiving the call request sent by the terminal device 120 . In this application, the imaging device 130 may be a camera or a camera, etc., and the imaging device 130 may be installed in a home, in a corridor, on a building in a residential area, on a street, or the like.

可以理解，图1所述的通过远程摄像告警系统还可以包括音频播放装置，该音频播放装置与摄像设备130连接。其中，音频播放装置例如有智能音箱、蓝牙音箱、或其他可以播放音频文件的装置。It can be understood that the remote camera warning system described in FIG. 1 may further include an audio playback device, and the audio playback device is connected to the camera device 130 . The audio playback device includes, for example, a smart speaker, a Bluetooth speaker, or other devices that can play audio files.

另外，图1所述的通过远程摄像告警系统还可以包括视频播放装置，该视频播放装置与摄像设备130连接。其中，视频播放装置例如有智能电视、或是其他可以播放视频文件的装置。In addition, the remote camera warning system described in FIG. 1 may further include a video playback device, and the video playback device is connected to the camera device 130 . Wherein, the video playback device is, for example, a smart TV, or other devices that can play video files.

另外，需要说明的是，图1中的终端设备120、摄像设备130和云端110的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备120、摄像设备130和云端110。In addition, it should be noted that the numbers of the terminal device 120, the camera device 130, and the cloud 110 in FIG. 1 are only illustrative. According to implementation requirements, there may be any number of terminal devices 120 , camera devices 130 and cloud 110 .

本申请实施例中，云端110能够根据目标图像中包含目标人物在内的多个目标主体的位置向量和姿态向量生成目标图像的高级语义描述，若高级语义描述中包括预设关键字，云端110可生成告警提示向终端设备120发送，可以让用户及时获知告警事件的发生，提高了目标图像中告警事件判断的准确度，且用户可根据需要与摄像设备130进行视频通话或语音通话，提升了用户与摄像设备130的交互体验。In the embodiment of the present application, the cloud 110 can generate a high-level semantic description of the target image according to the position vectors and posture vectors of multiple target subjects including the target person in the target image. If the high-level semantic description includes a preset keyword, the cloud 110 An alarm prompt can be generated and sent to the terminal device 120, so that the user can be informed of the occurrence of the alarm event in time, and the accuracy of the alarm event judgment in the target image can be improved. The user's interactive experience with the camera device 130 .

本申请还提供一种通过远程摄像告警方法，云端对摄像设备拍摄的实时图像进行图像识别并生成高级语义描述，在高级语义描述包括预设关键字的情况下，发送告警提示至终端设备。The present application also provides a remote camera alarm method, in which the cloud performs image recognition on real-time images captured by the camera device and generates a high-level semantic description, and sends an alarm prompt to the terminal device when the high-level semantic description includes preset keywords.

请参阅图2，图2是本申请实施例提供的一种通过远程摄像告警方法的流程示意图。本申请实施例提供的通过远程摄像告警方法，包括如下步骤：Please refer to FIG. 2 . FIG. 2 is a schematic flowchart of a method for alarming through a remote camera provided by an embodiment of the present application. The method for alarming by remote camera provided by the embodiment of the present application includes the following steps:

S101:摄像设备130对目标人物进行拍摄得到目标图像，并向云端110发送。S101: The imaging device 130 captures the target person to obtain a target image, and sends the target image to the cloud 110.

在本申请具体的实施例中，摄像设备130可旋转，可以实时监控，并可以每隔预设时间对监控区域进行拍摄，如0.003s、0.01s、0.06秒、0.1秒等，此处不作具体限定。摄像设备130拍摄得到的图像文件可以是图片信息(如采集时刻连续的多张图片)、或是视频信息(如一定时长的视频，如10s长的视频)等等。摄像设备130拍摄得到图像文件后，可以每隔预设时间将图像文件上传至摄像设备130的第二标识信息对应的云端110的云存储子空间，云端110可以对云存储子空间中的图像文件进行合并或者转发等操作。其中，摄像设备130向云端110发送图像文件可以通过无线方式实现，如3G、4G或者WLAN等，摄像设备130拍摄的图像文件中包含目标图像，目标图像是摄像设备130对目标人物进行拍摄得到的图像，目标人物是出现在摄像设备130监控区域的人类。另外，第二标识信息和云存储子空间的定义可以参照上述实施例中的阐述，在此不再进行赘述。In the specific embodiment of the present application, the camera device 130 can be rotated, can be monitored in real time, and can photograph the monitoring area at preset time intervals, such as 0.003s, 0.01s, 0.06s, 0.1s, etc., which are not described here. limited. The image file captured by the camera device 130 may be picture information (eg, multiple pictures that are continuously collected at the time of acquisition), or video information (eg, a video of a certain duration, such as a video of 10s long), and so on. After the camera device 130 captures the image file, it can upload the image file to the cloud storage subspace of the cloud 110 corresponding to the second identification information of the camera device 130 at preset time intervals, and the cloud 110 can store the image files in the cloud storage subspace. Perform operations such as merging or forwarding. The sending of the image file by the camera device 130 to the cloud 110 can be realized by wireless, such as 3G, 4G or WLAN, etc. The image file shot by the camera device 130 contains the target image, and the target image is obtained by the camera device 130 shooting the target person In the image, the target person is a human appearing in the area monitored by the camera device 130 . In addition, for the definition of the second identification information and the cloud storage subspace, reference may be made to the descriptions in the foregoing embodiments, and details are not repeated here.

由此可见，摄像设备130对监控区域进行实时监控，并可以每隔预设时间对监控区域进行拍摄，将拍摄的图像文件发送至云端110进行存储，无需用户使用终端设备120进行存储，减少终端设备120内存的占用，同时可以达到实时监控的目的。It can be seen that the camera device 130 monitors the monitoring area in real time, and can take pictures of the monitoring area at preset time intervals, and send the captured image files to the cloud 110 for storage, without the need for the user to use the terminal device 120 for storage, reducing the number of terminals The occupation of the memory of the device 120 can also achieve the purpose of real-time monitoring.

S102:云端110接收摄像设备130发送的目标图像，生成目标图像的高级语义描述。S102: The cloud 110 receives the target image sent by the camera device 130, and generates a high-level semantic description of the target image.

在本申请具体的实施例中，云端110的云存储子空间接收摄像设备130发送的目标图像后，云端110首先对该目标图像进行图像识别，提取目标图像中包含目标人物在内的多个目标主体的位置向量(即位置特征)和姿态向量(即姿态特征)，然后将该多个目标主体的位置向量输入语义描述模型中的低级语义单元生成低级语义描述，将该多个目标主体的姿态向量和上述低级语义描述输入语义描述模型中的高级语义单元生成高级语义描述。其中，上述低级语义描述为通过文字的方式对目标人物进行低级描述，高级语义描述为通过文字的方式对所述目标人物进行高级描述。In the specific embodiment of the present application, after the cloud storage subspace of the cloud 110 receives the target image sent by the camera device 130, the cloud 110 first performs image recognition on the target image, and extracts multiple targets including the target person in the target image. The position vector (that is, the position feature) and the pose vector (that is, the pose feature) of the subject, and then the position vectors of the multiple target subjects are input into the low-level semantic unit in the semantic description model to generate a low-level semantic description, and the poses of the multiple target subjects are The vectors and the high-level semantic units in the input semantic description model above low-level semantic descriptions generate high-level semantic descriptions. Wherein, the above-mentioned low-level semantic description refers to the low-level description of the target person by means of text, and the high-level semantic description refers to the high-level description of the target person by means of text.

在本申请具体的实施例中，用于生成高级语义描述的目标图像可以为单帧或多帧，该目标图像中的目标主体可以为一个或多个，目标主体中可包括一个或多个目标人物，目标主体还可以为动物或者物体等，不同的目标人物可以有相同或不同的动作，不同的目标主体可以有相同或不同的状态。如图3所示，图3示出了适用于本申请实施例的一种可能的用于生成高级语义描述的目标图像的示意图，图3中，目标图像为n帧，其中，n为大于或者等于1的自然数，目标主体有老人、鞋、台阶和篮子，其中，老人的状态是坐在台阶上，篮子的状态是倒在台阶旁，鞋的状态是一只在老人脚上，一只在地上，综合图中几个目标主体的状态可知图中目标人物老人的状态是在台阶上摔倒。可以理解，根据图像中多个目标主体的状态可以判断出目标人物的状态。需要说明的是，上述图3仅仅是作为一种示例，在实际应用中，目标人物可以是其他人，如男子、儿童或者婴儿等，目标主体中可以包括其他物体，如车、树、房子等，目标主体中还可以包括动物，如猫、狗、鱼等，目标主体的数量还可以是更多，目标主体的状态也可以是其他动作等等，如人的动作可以为看书、走路、看电视、打篮球等。In a specific embodiment of the present application, the target image used to generate the high-level semantic description may be a single frame or multiple frames, the target subject in the target image may be one or more, and the target subject may include one or more targets People, target subjects can also be animals or objects, etc. Different target characters can have the same or different actions, and different target subjects can have the same or different states. As shown in FIG. 3 , FIG. 3 shows a schematic diagram of a possible target image for generating high-level semantic description applicable to an embodiment of the present application. In FIG. 3 , the target image is n frames, where n is greater than or A natural number equal to 1. The target subjects include the elderly, shoes, steps and baskets. Among them, the status of the elderly is sitting on the steps, the status of the basket is lying beside the steps, and the status of the shoes is that one is on the feet of the elderly, and the other is on the steps. On the ground, the state of several target subjects in the comprehensive picture shows that the state of the target person, the old man, in the picture is that he fell on the steps. It can be understood that the state of the target person can be determined according to the state of the multiple target subjects in the image. It should be noted that the above Figure 3 is only an example. In practical applications, the target person may be other people, such as a man, a child or a baby, and the target subject may include other objects, such as a car, a tree, a house, etc. , the target subject can also include animals, such as cats, dogs, fish, etc., the number of target subjects can also be more, the state of the target subject can also be other actions, etc. TV, basketball, etc.

接下来，对云端110生成目标图像的高级语义描述的具体过程进行详细阐述，该过程可以包括如下步骤：Next, the specific process of generating the high-level semantic description of the target image by the cloud 110 is described in detail, and the process may include the following steps:

A1：云端110从目标图像中提取包含目标人物在内的多个目标主体的位置向量。A1: The cloud 110 extracts the position vectors of multiple target subjects including the target person from the target image.

在本申请具体的实施例中，云端110从目标图像中提取包含目标人物在内的多个目标主体的位置向量可以是通过卷积神经网络实现。继续以图3示出的目标图像为例，图3中包括n帧目标图像P₁、P₂、…、P_n。In a specific embodiment of the present application, the cloud 110 extracting the position vectors of multiple target subjects including the target person from the target image may be implemented by a convolutional neural network. Continuing to take the target image shown in FIG. 3 as an example, FIG. 3 includes n frames of target images P ₁ , P ₂ , . . . , P _n .

第一步，将n帧目标图像P₁、P₂、…、P_n输入卷积神经网络中的卷积层进行卷积操作。以目标图像P_i为例，其中，1≤i≤n，n为自然数，将目标图像P_i输入卷积层进行卷积操作可得到目标图像的特征图像，获取目标图像的特征图像的具体公式为：The first step is to input _n frames of target images P ₁ , P ₂ , . Take the target image Pi as an example, where 1≤i≤n, _n is a natural number, input the target image _Pi into the convolution layer for convolution operation to obtain the feature image of the target image, and obtain the specific formula of the feature image of the target image. for:

W'＝conv2(W,X,'valid')+bW'=conv2(W,X,'valid')+b

其中，X表示卷积核，valid表示padding的方式，b是偏置值，conv2()表示使用卷积核X对目标图像进行卷积运算，

表示激活函数，P_i'表示目标图像的特征图像。Among them, X represents the convolution kernel, valid represents the padding method, b is the bias value, and conv2() represents the convolution operation on the target image using the convolution kernel X,

represents the activation function, and P _i ' represents the feature image of the target image.

在实际应用中，上述获取目标图像P_i的特征图像P_i'的各个参数中，卷积核X(数量、大小、步长等)、偏置值b、激活函数

等参数或函数可以是人为根据需要提取的特征以及目标图像P_i的大小设置的。以卷积核X的步长为例，当目标图像P_i比较大时，步长也可以比较大，当目标图像P_i比较小时，步长也可以比较小，此处不作具体限定。在本申请实施例中，在对目标图像P_i进行卷积操作之前，还可以根据需要对目标图像P_i进行去均值处理、归一化处理或者白化处理等操作。In practical applications, among the above parameters of obtaining the feature image P _i ' of the target image P _i , the convolution kernel X (number, size, step size, etc.), bias value b, activation function

The parameters or functions can be artificially set according to the features to be extracted and the size of the target image _Pi . Taking the step size of the convolution kernel X as an example, when the target image _Pi is relatively large, the step size can also be relatively large, and when the target image _Pi is relatively small, the step size can also be relatively small, which is not specifically limited here. In this embodiment of the present application, before the convolution operation is performed on the target image _Pi , operations such as de-averaging, normalization, or whitening processing may also be performed on the target image _Pi as required.

为了简便起见，上面只陈述了目标图像P_i的特征图像P_i'的提取，实际上，目标图像P₁、P₂、…、P_n各自的特征图像P₁'、P₂'、…、P_n'的提取方式均与目标图像P_i的特征图像P_i'的提取方式相类似，此处不再展开赘述。For the sake of simplicity, only the extraction of the feature images P _i ' of the target image _Pi _is described above. _{In fact, the respective feature images P 1 ', P 2} _' _, _... , The extraction method of P _n ' is similar to the extraction method of the feature image P _i ' of the target image P _i , and will not be repeated here.

第二步，通常，在得到目标图像的特征图像之后，需要将特征图像输入下采样层进行池化操作得到池化图像，进行池化操作的目的是降低特征图像的数据量，具体的池化过程可以为：The second step, usually, after obtaining the feature image of the target image, it is necessary to input the feature image into the downsampling layer for pooling operation to obtain the pooled image. The purpose of the pooling operation is to reduce the amount of data of the feature image. The process can be:

A_i＝AveragePooling(P'_i)A _i =AveragePooling(P' _i )

其中，A_i表示池化图像，P_i'表示特征图像，AveragePooling()表示均值池化。Among them, A _i represents the pooled image, P _i ' represents the feature image, and AveragePooling() represents the mean pooling.

在实际应用中，还可以对特征图像进行最大值池化A_i＝MaxPooling(P'_i)得到池化图像，此处不作具体限定。在进行卷积操作之前，还可以对n帧目标图像进行去均值处理、归一化处理或白化处理等操作。为了简便起见，上面只陈述了特征图像P_i'的池化过程，实际上，特征图像P₁'、P₂'、…、P_n'的池化方式均与特征图像P_i'的池化方式相类似，此处不再展开赘述。In practical applications, the feature image may also be subjected to maximum pooling A _i =MaxPooling(P' _i ) to obtain a pooled image, which is not specifically limited here. Before performing the convolution operation, operations such as de-average processing, normalization processing, or whitening processing can also be performed on the n-frame target images. For the sake of simplicity, only the pooling process of the feature images P _i ' is described above. In fact, the pooling methods of the feature images P ₁ ', P ₂ ', ..., P _n ' are all the same as the pooling methods of the feature images P _i ' The method is similar, and will not be repeated here.

第三步，将n个池化后的池化图像A₁、A₂、…、A_n顺序展开成向量，并连接成一个长向量A，将长向量A输入全连接层，全连接层的输出即为目标图像P₁、P₂、…、P_n对应的初始特征向量V₁'、V₂'、…、V_n'。The third step is to expand the _n pooled images A ₁ , A ₂ , . The output is the initial feature vectors V ₁ ', V ₂ ', ..., V _n ' corresponding to the target images P ₁ , P ₂ , ..., P _n .

第四步，将上述全连接层输出的初始特征向量V₁'、V₂'、…、V_n'输入位置向量层，位置向量层可利用滤波跟踪算法对初始特征向量V₁'、V₂'、…、V_n'进行过滤得到过滤后的特征向量V₁”、V₂”、…、V_n”与，将过滤后的特征向量V₁”、V₂”、…、V_n”与初始特征向量V₁'、V₂'、…、V_n'进行拼接可以得到目标主体的目标特征向量V₁、V₂、…、V_n，对目标特征向量V₁、V₂、…、进行逆卷积处理，可以得到目标主体的位置向量。此处以目标图像P_i为例，假设目标图像P_i中包括m个目标主体，则目标图像P_i中m个目标主体的位置向量

可以通过将初始特征向量V_i'输入位置向量层进行提取得到，其中，m为自然数，并且1≤i≤n，位置向量可以是通过大量的已知目标图像以及已知目标主体的位置向量进行训练得到。本申请实施例为了简便起见，上面只陈述了目标图像P_i中m个目标主体的位置向量

的提取，实际上，目标图像P₁、P₂、…、P_n各自的m个目标主体的位置向量的提取方式均与目标图像P_i的m个目标主体的位置向量提取方式相类似，此处不再展开赘述。 _The fourth step _is to input the initial feature vectors V ₁ _' , V ₂ ', . _' _, _. _{_} _{_} _{_} _{_} The initial feature vectors V ₁ ', V ₂ ', ..., V _n ' can be spliced to obtain the target feature vectors V ₁ , V ₂ , ..., V _n of the target subject, and the target feature vectors V ₁ , V ₂ , ... , ... Inverse convolution processing, the position vector of the target subject can be obtained. Here, taking the target image P _i as an example, assuming that the target image P _i includes m target subjects, then the position vectors of the m target subjects in the target image P _i

It can be obtained by inputting the initial feature vector V _i ' into the position vector layer for extraction, where m is a natural number, and 1≤i≤n, and the position vector can be obtained through a large number of known target images and the position vector of the known target subject. Trained to get. For the sake of simplicity in the embodiments of the present application, only the position vectors of m target subjects in the target image P _i are described above.

In fact, the extraction methods of the position vectors of the m target subjects in the target images P ₁ , P ₂ , ..., P _n are all similar to the extraction methods of the position vectors of the m target subjects in the target image P _i . No further elaboration here.

在本申请具体的实施例中，位置向量用于表示对应的目标主体在目标图像中的位置，目标主体2以不同的目标主体1作为参照，目标主体2的位置不相同，本实施例的位置向量可以包括目标主体的中心点在目标图像中的横坐标、纵坐标、宽度和高度等。姿态向量用于表示目标主体的动作，目标主体不同，目标主体的动作通常也不相同，可以理解，姿态向量通常也不相同。以目标主体为儿童为例，目标主体儿童的动作可以为睡觉、跳跃、欢呼、拍手、哭泣等。In a specific embodiment of the present application, the position vector is used to represent the position of the corresponding target subject in the target image. The target subject 2 uses different target subjects 1 as a reference, and the positions of the target subjects 2 are different. The vector may include the abscissa, ordinate, width and height of the center point of the target body in the target image. The pose vector is used to represent the action of the target subject. The action of the target subject is usually different when the target subject is different. It can be understood that the pose vector is usually different. Taking the target subject as a child as an example, the actions of the target subject child can be sleeping, jumping, cheering, clapping, crying, and the like.

A2：云端110从目标图像中提取包含目标人物在内的多个目标主体的姿态向量。A2: The cloud 110 extracts pose vectors of multiple target subjects including the target person from the target image.

在本申请具体的实施例中，云端110从目标图像中提取包含目标人物在内的多个目标主体的姿态向量也可以是通过卷积神经网络实现。继续以图3示出的多帧图像为例，图3中包括n帧目标图像P₁、P₂、…、P_n，其中，n为大于或者等于1的自然数。In a specific embodiment of the present application, the cloud 110 may also use a convolutional neural network to extract the pose vectors of multiple target subjects including the target person from the target image. Continuing to take the multi-frame image shown in FIG. 3 as an example, FIG. 3 includes n frames of target images P ₁ , P ₂ , . . . , P _n , where n is a natural number greater than or equal to 1.

第一步、第二步和第三步均与上述A1步骤中的第一步、第二步和第三步相同，此处不再赘述。The first step, the second step and the third step are the same as the first step, the second step and the third step in the above-mentioned A1 step, and will not be repeated here.

第四步、将上述全连接层输出的初始特征向量V₁'、V₂'、…、V_n'输入姿态向量层，姿态向量层可利用滤波跟踪算法对初始特征向量V₁'、V₂'、…、V_n'进行过滤得到过滤后的特征向量V₁”、V₂”、…、V_n”，将过滤后的特征向量V₁”、V₂”、…、V_n”与初始特征向量V₁'、V₂'、…、V_n'进行拼接可以得到目标主体的目标特征向量V₁、V₂、…、V_n，利用Kmeans算法对目标特征向量V₁、V₂、…、V_n进行计算可以得到目标主体的姿态向量。此处以目标图像P_i为例，假设目标图像P_i中包括m个目标主体，则目标图像P_i中m个目标主体的姿态向量Z₁ ⁱ、Z₂ ⁱ、…、Zⁱ _m可以通过将初始特征向量V_i'输入姿态向量层进行提取得到，其中，m为自然数，并且1≤i≤n，姿态向量可以是通过大量的已知目标图像以及已知目标主体的姿态向量进行训练得到。为了简便起见，上面只陈述了目标图像P_i中m个目标主体的姿态向量

的提取，实际上，目标图像P₁、P₂、…、P_n各自的m个目标主体的姿态向量的提取方式均与目标图像P_i的m个目标主体的姿态向量提取方式相类似，此处不再展开赘述。The fourth step is to input the initial feature vectors V ₁ ', V ₂ ', ..., V _n ' output by the above-mentioned fully connected layer into the attitude vector layer, and the attitude vector layer can use the filtering and tracking algorithm to analyze the initial feature vectors V ₁ ', V 2 ', V ₂ . _' _, _. _{_} _{_} _{_} _{_} The target feature vectors V ₁ , V ₂ , ..., V n of the target subject can be obtained by splicing the feature vectors V ₁ ', V ₂ ', ..., V _n ', and the target feature vectors V ₁ , V ₂ , ..., V _n can be obtained by using the Kmeans algorithm. , V _n can be calculated to obtain the attitude vector of the target subject. Taking the target image P _i as an example, assuming that the target image P _i includes m target subjects, the pose vectors Z ₁ ⁱ , Z ₂ ⁱ , . . . , Z ⁱ _m of the m target subjects in the target image _Pi can be determined by The initial feature vector V _i ' is extracted from the input pose vector layer, where m is a natural number, and 1≤i≤n, and the pose vector can be obtained by training a large number of known target images and known target subject pose vectors. For the sake of simplicity, only the pose _vectors of m target subjects in the target image Pi are stated above.

In fact, the way of extracting the pose vectors of m target subjects in each of the target images P ₁ , P ₂ , ..., P _n is similar to that of the m target subjects in the target image P _i . No further elaboration here.

A3：云端110将多个目标主体的位置向量和多个目标主体的姿态向量输入语义描述模型，从而生成目标图像的高级语义描述。A3: The cloud 110 inputs the position vectors of the multiple target subjects and the pose vectors of the multiple target subjects into the semantic description model, thereby generating a high-level semantic description of the target image.

在本申请具体的实施例中，语义描述模型可以表示为：In a specific embodiment of the present application, the semantic description model can be expressed as:

y＝f(x)y=f(x)

其中，x为高级语义描述的影响因子，y为高级语义描述，f()为高级语义描述的影响因子与高级语义描述的映射关系。f()可以是通过大量的已知高级语义描述的影响因子和已知高级语义描述进行训练得到。Among them, x is the influence factor of the advanced semantic description, y is the advanced semantic description, and f() is the mapping relationship between the influence factor of the advanced semantic description and the advanced semantic description. f() can be obtained by training a large number of known high-level semantic description influence factors and known high-level semantic descriptions.

在一具体的实施例中，语义描述模型可以如图4所示，将上述实施例中提取出的包含目标人物在内的多个目标主体的位置向量输入低级语义单元可以得到低级语义描述；将上述实施例提取出的包含目标人物在内的多个目标主体的姿态向量结合上述低级语义描述输入高级语义单元可以得到高级语义描述。其中，低级语义描述为通过文字的方式对目人物进行低级描述，高级语义描述为通过文字的方式对目标人物进行高级描述，对于不同的目标人物，生成的低级语义描述和高级语义描述通常也不相同。In a specific embodiment, the semantic description model may be as shown in FIG. 4 , and the low-level semantic description can be obtained by inputting the position vectors of multiple target subjects including the target person extracted in the above-mentioned embodiment into the low-level semantic unit; The high-level semantic description can be obtained by inputting the gesture vectors of multiple target subjects including the target person extracted in the above embodiment into the high-level semantic unit in combination with the above-mentioned low-level semantic description. Among them, the low-level semantic description refers to the low-level description of the target person through text, and the high-level semantic description refers to the high-level description of the target person through text. For different target people, the generated low-level semantic description and high-level semantic description are usually different. same.

如，继续以图3所示的图像为例，假设提取出的目标人物老人的低级语义描述为“老人在台阶上方”，目标主体鞋的低级语义描述为“一只鞋在老人脚上，一只鞋在老人脚前方”，然后将这两个低级语义描述结合上述实施例提取出的目标人物老人和目标主体鞋的姿态向量输入高级语义描述单元，可以生成高级语义描述“老人从台阶上摔倒”。For example, continuing to take the image shown in Figure 3 as an example, suppose that the extracted low-level semantic description of the target person, the old man, is "the old man is above the steps", and the low-level semantic description of the target subject's shoes is "a shoe is on the old man's foot, a Only the shoes are in front of the old man's feet", and then these two low-level semantic descriptions are combined with the pose vectors of the target person, the old man and the target subject's shoes, extracted from the above embodiment, and input into the high-level semantic description unit to generate the high-level semantic description "The old man fell from the steps. fall".

在本申请具体的实施例中，上述实施例中的特征向量提取、位置向量提取和姿态向量提取可以集成在同一个卷积神经网络中实现，也可以分别是不同的卷积神经网络实现的。其中，卷积神经网络可以包括VGGNet、ResNet、FPNet等等，此处不作具体限定。In the specific embodiment of the present application, the feature vector extraction, the position vector extraction and the attitude vector extraction in the above embodiments may be implemented in the same convolutional neural network, or may be implemented in different convolutional neural networks respectively. Wherein, the convolutional neural network may include VGGNet, ResNet, FPNet, etc., which is not specifically limited here.

在本申请具体的实施例中，上述实施例中的低级语义描述生成和高级语义描述生成可以集成在同一个循环神经网络中实现，也可以分别是不同的循环神经网络进行生成，此处不作具体限定。其中，循环神经网络可以包括长短时记忆模型模型(Long short-termmemory，LSTM)、双向长短时记忆模型模型(Bi Long short-term memory，BiLSTM)等等，此处不作具体限定。In the specific embodiment of the present application, the generation of the low-level semantic description and the generation of the high-level semantic description in the above-mentioned embodiments may be integrated in the same recurrent neural network, or may be generated by different recurrent neural networks, which are not described here. limited. The recurrent neural network may include a long short-term memory model (Long short-term memory, LSTM), a bidirectional long short-term memory model (Bi Long short-term memory, BiLSTM), etc., which are not specifically limited here.

由此可见，本申请实施例能够利用卷积神经网络对目标图像中包含目标人物在内的多个目标主体的位置向量和姿态向量进行提取，并利用循环神经网络生成高级语义描述，从而更好地描述目标图像中多个目标主体之间的状态，得到更准确的图像语义描述。It can be seen that the embodiment of the present application can use the convolutional neural network to extract the position vector and attitude vector of multiple target subjects including the target person in the target image, and use the recurrent neural network to generate high-level semantic description, so as to better It can accurately describe the state between multiple target subjects in the target image, and obtain a more accurate image semantic description.

S103:在高级语义描述包括预设关键字的情况下，云端110生成告警提示并向终端设备120发送。S103: In the case that the high-level semantic description includes a preset keyword, the cloud 110 generates an alarm prompt and sends it to the terminal device 120.

在本申请具体的实施例中，若云端110生成的目标图像的高级语义描述包括预设关键字，如“摔倒”、“跌倒”“哭泣”、“挣扎”、“抽搐”、“血”等关键字，云端110生成告警提示并发送至终端设备120，其中，告警提示中可包括高级语义描述。继续以图3所示的图像为例，假设生成的目标人物老人的高级语义描述为“老人从台阶上摔倒”，该高级语义描述中包括“摔倒”这一关键字，因此云端110生成“老人从台阶上摔倒”的告警提示并向终端设备120发送。In a specific embodiment of the present application, if the high-level semantic description of the target image generated by the cloud 110 includes preset keywords, such as "fall", "fall", "crying", "struggle", "twitch", "blood" and other keywords, the cloud 110 generates an alarm prompt and sends it to the terminal device 120, wherein the alarm prompt may include a high-level semantic description. Continuing to take the image shown in Figure 3 as an example, it is assumed that the generated high-level semantic description of the target person, the old man, is "the old man fell from the steps", and the high-level semantic description includes the keyword "fall", so the cloud 110 generates An alarm prompt of "the old man fell from the steps" is sent to the terminal device 120 .

由此可见，本申请实施例在云端110生成的高级语义描述包括预设关键字的情况下，生成告警提示并向终端设备120发送，可以让用户尽快获知发生告警事件的地点和类型，不需要用户下载并查看图像文件或者调起摄像设备130查看监控区域从而主观判断图像中的事件，提升了事件判断的准确度，节省用户时间。It can be seen that, in the embodiment of the present application, when the high-level semantic description generated by the cloud 110 includes preset keywords, an alarm prompt is generated and sent to the terminal device 120, so that the user can know the location and type of the alarm event as soon as possible, without the need for The user downloads and views the image file or activates the camera device 130 to view the monitoring area so as to subjectively judge the event in the image, which improves the accuracy of the event judgment and saves the user's time.

S104:终端设备120接收云端110发送的告警提示。S104: The terminal device 120 receives the alarm prompt sent by the cloud 110.

在本申请具体的实施例中，终端设备120在接收到云端110发送的告警提示的情况下，向摄像设备130发送通话请求。其中，若终端设备120通过无线局域网WLAN接入了网络，终端设备120向摄像设备130发送通话请求可与摄像设备130建立视频通话；若终端设备120不是通过无线局域网WLAN接入了网络，如通过3G、4G等接入网络，终端设备120向摄像设备130发送通话请求可与摄像设备130建立语音通话。另外，在终端设备120与摄像设备130建立语音通话的情况下，终端设备120的语音通话界面显示视频通话快捷键，若视频通话快捷键被触发，则终端设备120停止与摄像设备130的语音通话，切换至与摄像设备130的视频通话。In the specific embodiment of the present application, the terminal device 120 sends a call request to the camera device 130 in the case of receiving the alarm prompt sent by the cloud 110 . Wherein, if the terminal device 120 accesses the network through the wireless local area network (WLAN), the terminal device 120 sends a call request to the camera device 130 to establish a video call with the camera device 130; When 3G, 4G, etc. are connected to the network, the terminal device 120 sends a call request to the camera device 130 to establish a voice call with the camera device 130 . In addition, when the terminal device 120 establishes a voice call with the camera device 130, the voice call interface of the terminal device 120 displays a video call shortcut key. If the video call shortcut key is triggered, the terminal device 120 stops the voice call with the camera device 130. , switch to the video call with the camera device 130 .

继续以上述图3所示的图像为例，终端设备120接收到云端110发送的告警提示“老人从台阶上摔倒”，用户可获知“老人从台阶上摔倒”告警事件，在用户的终端设备120通过WLAN连接网络的情况下，如图5所示，图5为一种可能的终端设备120与摄像设备130视频通话界面的示意图，该终端设备120的视频通话界面上显示“开始”快捷键或“结束”快捷键等，用户点击“开始”，便可建立与摄像设备130的视频通话查看摄像设备130监控区域的画面，查看老人从台阶上摔倒的具体情况。用户点击“结束”，便可终止与摄像设备130的视频通话。在终端设备120不是通过WLAN连接网络的情况下，如图6所示，图6为一种可能的终端设备120与摄像设备130语音通话界面的示意图，该终端设备120的语音通话界面上显示“开始”快捷键、“视频通话”快捷键、内置应用的图标(如相机或相册)或者“结束”快捷键等，用户点击“开始”，便可建立与摄像设备130的语音通话向摄像设备130发送语音消息；用户点击“视频通话”快捷键，便可终止与摄像设备130的语音通话，建立与摄像设备130的视频通话查看摄像设备130监控区域的画面；用户点击内置应用的图标(如相机或相册)，便可进入该内置应用内部看到该终端设备120绑定的所有摄像设备130的图标，可与任一摄像设备130建立视频通话或者语音通话，还可以在视频通话或者语音通话过程中切换摄像设备130；用户点击“结束”，便可终止与摄像设备130的语音通话。Continuing to take the image shown in FIG. 3 above as an example, the terminal device 120 receives the alarm prompt "the old man fell from the steps" sent by the cloud 110, and the user can know the alarm event of "the old man fell from the steps". In the case where the device 120 is connected to the network through WLAN, as shown in FIG. 5 , FIG. 5 is a schematic diagram of a possible video call interface between the terminal device 120 and the camera device 130 , and the “Start” shortcut is displayed on the video call interface of the terminal device 120 key or the "end" shortcut key, etc., the user clicks "start" to establish a video call with the camera device 130 to view the screen of the monitoring area of the camera device 130, and to check the specific situation of the elderly falling from the steps. The user clicks "End" to terminate the video call with the camera device 130 . In the case where the terminal device 120 is not connected to the network through WLAN, as shown in FIG. 6, FIG. 6 is a schematic diagram of a possible voice call interface between the terminal device 120 and the camera device 130, and the voice call interface of the terminal device 120 displays “ Start” shortcut key, “video call” shortcut key, built-in application icon (such as camera or photo album) or “end” shortcut key, etc., the user can click “Start” to establish a voice call with the camera device 130 to the camera device 130 Send a voice message; the user clicks the "video call" shortcut key to terminate the voice call with the camera device 130, establish a video call with the camera device 130 to view the screen of the monitoring area of the camera device 130; the user clicks the built-in application icon (such as the camera or photo album), you can enter the built-in application to see the icons of all the camera devices 130 bound to the terminal device 120, and can establish a video call or a voice call with any camera device 130. Switch the camera device 130 in the middle; the user can click "End" to terminate the voice call with the camera device 130 .

在本申请具体的实施例中，终端设备120还可以查看云端110上云存储空间和云存储子空间中的图像文件并根据需要进行下载、更新或删除等。In the specific embodiment of the present application, the terminal device 120 can also view the image files in the cloud storage space and the cloud storage subspace on the cloud 110 and download, update or delete them as needed.

上述方案中，云端110能够根据目标图像中包含目标人物在内的多个目标主体的位置向量和姿态向量生成目标图像的高级语义描述，若高级语义描述中包括预设关键字，云端110可生成告警提示向终端设备120发送，可以让用户及时获知告警事件的发生，且该终端设备120上设置有“开始”快捷键、“视频通话快捷键”和内置应用图标等，用户可根据需要与摄像设备130进行视频通话或语音通话，提升了用户与摄像设备130的交互体验。另外，本申请实施例深度结合终端设备120、云端110和摄像设备130，在终端设备120内置应用(如相机或相册应用)中内置摄像设备130的入口，用户可以通过终端设备120的内置应用与摄像设备130进行交互，无需用户在终端设备120安装第三方应用软件才能与摄像设备130进行交互，可以防止摄像设备130拍摄的图像文件被第三方软件获取，泄露用户隐私。In the above solution, the cloud 110 can generate a high-level semantic description of the target image according to the position vectors and posture vectors of multiple target subjects including the target person in the target image. If the high-level semantic description includes a preset keyword, the cloud 110 can generate a high-level semantic description. The alarm prompt is sent to the terminal device 120, so that the user can be informed of the occurrence of the alarm event in time, and the terminal device 120 is provided with a "start" shortcut key, a "video call shortcut key" and a built-in application icon, etc. The device 130 conducts a video call or a voice call, which improves the user's interactive experience with the camera device 130 . In addition, the embodiments of the present application are deeply combined with the terminal device 120, the cloud 110 and the camera device 130, and the built-in application (such as a camera or photo album application) of the terminal device 120 has a built-in entrance of the camera device 130. When the camera device 130 interacts, the user does not need to install third-party application software on the terminal device 120 to interact with the camera device 130, which can prevent the image files captured by the camera device 130 from being acquired by third-party software and leak user privacy.

本申请实施例涉及云端110，如图7所示，图7示出的是本申请涉及的一种可能的云端100的示意图。云的拥有者自己部署的云端100的云计算基础设施，即，部署计算资源111(例如，服务器)、部署存储资源112(例如，存储器)以及部署网络资源113(例如，网卡)等等。然后，公有云的拥有者(例如，运营商)将云计算基础设施的计算资源111、存储资源112、网络资源113进行虚拟化，并提供相应的服务给云的使用者(例如，用户)使用。其中，运营商可以提供以下三种服务给用户使用：云计算基础设施即服务(infrastructure as aservice，IaaS)、平台即服务(platform as a service，PaaS)以及软件即服务(softwareas a service，SaaS)。The embodiment of the present application relates to the cloud 110, as shown in FIG. 7, which is a schematic diagram of a possible cloud 100 involved in the present application. The cloud computing infrastructure of the cloud 100 deployed by the owner of the cloud itself, ie, deploying computing resources 111 (eg, servers), deploying storage resources 112 (eg, memory), and deploying network resources 113 (eg, network cards), and so on. Then, the owner of the public cloud (eg, an operator) virtualizes the computing resources 111 , storage resources 112 , and network resources 113 of the cloud computing infrastructure, and provides corresponding services for the users (eg, users) of the cloud to use . Among them, operators can provide the following three services to users: cloud computing infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS) .

IaaS提供给用户的服务是对云计算基础设施的利用，包括处理、存储、网络和其它基本的计算资源111，用户能够在云端100部署和运行任意软件，包括操作系统和应用程序。用户不管理或控制任何云计算基础设施，但能控制操作系统的选择、储存空间、部署应用，也有可能获得有限制的网络组件(例如，防火墙，负载均衡器等)的控制。The service provided by IaaS to the user is the utilization of cloud computing infrastructure, including processing, storage, network and other basic computing resources 111 , and the user can deploy and run any software, including operating systems and applications, in the cloud 100 . Users do not manage or control any cloud computing infrastructure, but can control the choice of operating system, storage space, deploy applications, and may also gain control of limited network components (eg, firewalls, load balancers, etc.).

PaaS提供给用户的服务是把用户采用供应商提供的开发语言和工具(例如Java，python，Net等)开发的或收购的应用程序部署到云计算基础设施上去。用户不需要管理或控制底层的云计算基础设施，包括网络、服务器、操作系统、存储等，但用户能控制部署的应用程序，也可能控制运行应用程序的托管环境配置。The service provided by PaaS to users is to deploy applications developed or acquired by users using development languages and tools (such as Java, python, Net, etc.) provided by suppliers to cloud computing infrastructure. Users do not need to manage or control the underlying cloud computing infrastructure, including networks, servers, operating systems, storage, etc., but users can control the deployed applications and possibly the configuration of the hosting environment in which the applications run.

SaaS提供给用户的服务是运营商运行在云计算基础设施上的应用程序，用户可以在各种终端设备上通过客户端界面，如浏览器，访问云计算基础设施上的应用程序。用户不需要管理或控制任何云计算基础设施，包括网络、服务器、操作系统、存储等等。The service provided by SaaS to the user is the application program that the operator runs on the cloud computing infrastructure, and the user can access the application program on the cloud computing infrastructure through the client interface, such as a browser, on various terminal devices. Users do not need to manage or control any cloud computing infrastructure, including networks, servers, operating systems, storage, and more.

可以理解，运营商通过IaaS、PaaS、SaaS中的任意一种为不同的租户进行租赁服务，不同租户之间数据和配置是相互隔离的，从而保证每个租户数据的安全与隐私。It can be understood that operators provide leasing services for different tenants through any one of IaaS, PaaS, and SaaS, and the data and configurations between different tenants are isolated from each other, thereby ensuring the security and privacy of each tenant's data.

本领域技术人员可以理解，图7中示出的云端并不构成对云端的限定，可以包括比图示更多或更少的服务或设施，或者组合某些服务或设施，或者拆分某些服务或设施，或者不同的服务分配或者设施布置。Those skilled in the art can understand that the cloud shown in FIG. 7 does not constitute a limitation on the cloud, and may include more or less services or facilities than the one shown, or combine some services or facilities, or split some Services or facilities, or different service assignments or facility arrangements.

本申请实施例涉及终端设备120，该终端设备120可以是手机、平板电脑、个人数字助理(personal digital assistant，PDA)、移动互联网设备(mobile internet device，MID)、笔记本电脑、智能穿戴设备(如智能手表、智能手环)等各种终端设备120，本申请实施例不作限定。The embodiments of the present application relate to a terminal device 120, and the terminal device 120 may be a mobile phone, a tablet computer, a personal digital assistant (PDA), a mobile internet device (MID), a notebook computer, or a smart wearable device (such as Various terminal devices 120 such as smart watches and smart bracelets) are not limited in the embodiments of the present application.

以终端设备120为手机为例，图8示出的是与本申请实施例相关的手机200的部分结构的框图。参考图8，手机200包括存储器211、处理器212、输入/输出(input/output，I/O)子系统213、其他输入设备控制器214、其他输入设备215、显示控制器216、显示屏217以及传感器控制器218等部件。本领域技术人员可以理解，图8中示出的手机结构并不构成对手机的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置。Taking the terminal device 120 as a mobile phone as an example, FIG. 8 is a block diagram showing a partial structure of the mobile phone 200 related to the embodiment of the present application. 8 , the mobile phone 200 includes a memory 211 , a processor 212 , an input/output (I/O) subsystem 213 , other input device controllers 214 , other input devices 215 , a display controller 216 , and a display screen 217 And components such as sensor controller 218. Those skilled in the art can understand that the structure of the mobile phone shown in FIG. 8 does not constitute a limitation on the mobile phone, and may include more or less components than the one shown in the figure, or combine some components, or disassemble some components, or Different component arrangements.

下面结合图8对手机200的各个构成部件进行具体的介绍：Below in conjunction with FIG. 8, each constituent component of the mobile phone 200 will be specifically introduced:

存储器211可用于存储软件程序以及模块，处理器212通过运行存储在存储器211的软件程序以及模块，从而执行手机200的各种功能应用以及数据处理。存储器211可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据手机200的使用所创建的数据(比如音频数据、电话本等)等。此外，存储器211可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 211 can be used to store software programs and modules, and the processor 212 executes various functional applications and data processing of the mobile phone 200 by running the software programs and modules stored in the memory 211 . The memory 211 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required for at least one function, and the like; The use of the cell phone 200 creates data (such as audio data, phone book, etc.) and the like. In addition, memory 211 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

显示屏217可用于显示由用户输入的信息或提供给用户的信息以及手机200的各种菜单，还可以接受用户输入。可选地，显示屏217可包括显示面板以及触控面板。其中显示面板可以采用液晶显示器(liquid crystal display，LCD)、有机发光二极管(organiclight-emitting diode，OLED)等形式来配置显示面板。触控面板，也称为触摸屏、触敏屏等，可收集用户在其上或附近的接触或者非接触操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作，也可以包括体感操作；该操作包括单点控制操作、多点控制操作等操作类型。)，并根据预先设定的程式驱动相应的连接装置。The display screen 217 may be used to display information input by or provided to the user and various menus of the cell phone 200, and may also accept user input. Optionally, the display screen 217 may include a display panel and a touch panel. The display panel can be configured in the form of a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (organic light-emitting diode, OLED) or the like. A touch panel, also known as a touch screen, touch-sensitive screen, etc., collects the user's contact or non-contact operations on or near it (such as the user's finger, stylus, etc., any suitable object or accessory on the touch panel or in the The operation near the touch panel may also include somatosensory operation; the operation includes operation types such as single-point control operation, multi-point control operation, etc.), and the corresponding connection device is driven according to a preset program.

I/O子系统213用来控制输入输出的外部设备，可以包括其他输入设备控制器214、传感器控制器218、显示控制器216。可选的，一个或多个其他输入设备控制器214从其他输入设备215接收信号和/或者向其他输入设备215发送信号，其他输入设备215可以包括物理按钮(按压按钮、摇臂按钮等)、拨号盘、滑动开关、操纵杆、点击滚轮、光鼠(光鼠是不显示可视输出的触摸敏感表面，或者是由触摸屏形成的触摸敏感表面的延伸)。值得说明的是，其他输入设备控制器214可以与任一个或者多个上述设备连接。所述I/O子系统213中的显示控制器216从显示屏217接收信号和/或者向显示屏217发送信号。显示屏217检测到用户输入后，显示控制器216将检测到的用户输入转换为与显示在显示屏217上的用户界面对象的交互，即实现人机交互。传感器控制器218可以从一个或者多个传感器接收信号和/或者向一个或者多个传感器发送信号。The I/O subsystem 213 is used to control input and output external devices, which may include other input device controllers 214 , sensor controllers 218 , and display controllers 216 . Optionally, one or more other input device controllers 214 receive signals from and/or send signals to other input devices 215, which may include physical buttons (push buttons, rocker buttons, etc.), Dials, slide switches, joysticks, click wheels, light mice (light mice are touch-sensitive surfaces that do not display visual output, or are extensions of touch-sensitive surfaces formed by a touch screen). It should be noted that other input device controllers 214 may be connected to any one or more of the above-mentioned devices. The display controller 216 in the I/O subsystem 213 receives signals from and/or sends signals to the display screen 217 . After the display screen 217 detects the user input, the display controller 216 converts the detected user input into interaction with the user interface objects displayed on the display screen 217 , that is, to realize human-computer interaction. Sensor controller 218 may receive signals from and/or send signals to one or more sensors.

处理器212是手机200的控制中心，利用各种接口和线路连接整个手机的各个部分，通过运行或执行存储在存储器211内的软件程序和/或模块，以及调用存储在存储器211内的数据，执行手机200的各种功能和处理数据，从而对手机进行整体监控。可选的，处理器212可包括一个或多个处理单元；优选的，处理器212可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器212中。The processor 212 is the control center of the mobile phone 200, using various interfaces and lines to connect various parts of the entire mobile phone, by running or executing the software programs and/or modules stored in the memory 211, and calling the data stored in the memory 211, Execute various functions of the mobile phone 200 and process data, so as to monitor the mobile phone as a whole. Optionally, the processor 212 may include one or more processing units; preferably, the processor 212 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, etc. , the modem processor mainly deals with wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 212 .

尽管未示出，手机200还可以包括给各个部件供电的电源(比如电池)、RF(RadioFrequency，射频)电路、音频电路、传感器、摄像头、蓝牙模块等，在此不再赘述。Although not shown, the mobile phone 200 may further include a power supply (such as a battery) for supplying power to various components, an RF (Radio Frequency) circuit, an audio circuit, a sensor, a camera, a Bluetooth module, etc., which will not be repeated here.

本申请实施例涉及摄像设备130，该摄像设备130可以是模拟摄像机、网络摄像机或智能摄像头等各种摄像设备130，本申请以网络摄像机为例，如图9所示，图9示出的是本申请涉及的一种可能的网络摄像机300的部分结构的框图。网络摄像机300，又称IP摄像机(ip camera，IPC)，采用嵌入式架构，集成了视频音频采集、信号处理、编码压缩、前端存储及网络传输等多种功能，再结合网络视频存储录像系统及管理平台软件，可以构成大规模、分布式的网络视频监控系统。参考图9，网络摄像机300包括镜头及传感器311、编码处理器312以及网络摄像机主控板313等部件。本领域技术人员可以理解，图9中示出的网络摄像机结构并不构成对摄像机的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置。The embodiment of the present application relates to a camera device 130. The camera device 130 may be a variety of camera devices 130 such as an analog camera, a network camera, or a smart camera. The present application uses a network camera as an example, as shown in FIG. 9 . A block diagram of a partial structure of a possible network camera 300 involved in this application. IP camera 300, also known as IP camera (IP camera, IPC), adopts embedded architecture and integrates various functions such as video and audio acquisition, signal processing, encoding and compression, front-end storage and network transmission, etc., combined with network video storage and recording system and The management platform software can form a large-scale and distributed network video surveillance system. Referring to FIG. 9 , the network camera 300 includes a lens and a sensor 311 , an encoding processor 312 , and a network camera main control board 313 and other components. Those skilled in the art can understand that the structure of the network camera shown in FIG. 9 does not constitute a limitation on the camera, and may include more or less components than those shown in the figure, or combine some components, or split some components, Or a different component arrangement.

下面结合图9对网络摄像机300的各个构成部件进行具体的介绍：In the following, each component of the network camera 300 will be introduced in detail with reference to FIG. 9 :

镜头及传感器311中的镜头是视频监控系统的关键设备，它的质量优劣直接影响网络摄像机300的整机质量。镜头可用于将外界的景物成像在传感器上，目前，网络摄像机300的镜头均是螺纹口的，通常由一组透镜和光阑组成，镜头有手动光圈(manual iris，MI)和自动光圈(auto iris，AI)之分，手动光圈镜头适合于亮度不变的场合，自动光圈镜头因亮度变更时其光圈会自动调整，所以适用于亮度变化的场合。可选地，该镜头可以为标准镜头、远摄镜头、变倍镜头或者可变焦点镜头等等，该镜头的材料可以为玻璃或者塑料。The lens in the lens and the sensor 311 is the key equipment of the video surveillance system, and its quality directly affects the overall quality of the network camera 300 . The lens can be used to image the outside world on the sensor. At present, the lens of the IP camera 300 is threaded and usually consists of a group of lenses and a diaphragm. The lens has a manual iris (MI) and an automatic iris (auto iris). , AI), the manual aperture lens is suitable for occasions where the brightness is unchanged, and the automatic aperture lens will automatically adjust its aperture when the brightness changes, so it is suitable for occasions where the brightness changes. Optionally, the lens may be a standard lens, a telephoto lens, a zoom lens, or a variable focus lens, etc., and the material of the lens may be glass or plastic.

镜头及传感器311中的传感器可以是影像传感器，如电荷耦合器件(chargecoupled device，CCD)传感器或互补性氧化金属半导体(complementary metal oxidesemiconductor，CMOS)传感器等，用于将传感器上接收到的光信号(物体的像)转化成电信号，并通过驱动电路输出至编码处理器312，由编码处理器312将镜头及传感器311采集的数字图像信号进行优化处理，如色彩、锐度或者白平衡等等，然后以网络视频信号的形式输入到网络摄像机主控板313，网络摄像机主控板313具有刺刀螺母连接器(bayonet nutconnector，BNC)视频输出、网络通讯接口、音频输入、音频输出、告警输出、告警输入、串口通讯接口等功能。其中，编码处理器312用来对镜头及传感器311传来的数字图像信号进行优化处理，编码处理器312可以包括镜像信号处理器(image signal processor，ISP)或图像解码器等，此处不作具体限定。The sensor in the lens and sensor 311 can be an image sensor, such as a charge-coupled device (CCD) sensor or a complementary metal oxide semiconductor (complementary metal oxide semiconductor, CMOS) sensor, etc., for converting the light signal ( The image of the object) is converted into an electrical signal, and output to the encoding processor 312 through the driving circuit. The encoding processor 312 optimizes the digital image signal collected by the lens and the sensor 311, such as color, sharpness or white balance, etc., Then it is input to the network camera main control board 313 in the form of network video signal. The network camera main control board 313 has bayonet nut connector (BNC) video output, network communication interface, audio input, audio output, alarm output, alarm Input, serial communication interface and other functions. The encoding processor 312 is used to optimize the digital image signals transmitted from the lens and the sensor 311. The encoding processor 312 may include an image signal processor (ISP) or an image decoder, etc., which are not specifically described here. limited.

尽管未示出，网络摄像机300还可以包括给各个部件供电的电源(比如电池)、滤光器或者蓝牙模块等，在此不再赘述。Although not shown, the IP camera 300 may also include a power source (such as a battery) for supplying power to various components, an optical filter or a Bluetooth module, etc., which will not be repeated here.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，上述描述的云端110、终端设备120、摄像设备130和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, for the specific working process of the cloud 110, the terminal device 120, the camera device 130 and the unit described above, reference may be made to the corresponding process in the foregoing method embodiments. This will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接，也可以是电的，机械的或其它的形式连接。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present application.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分，或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application are essentially or part of contributions to the prior art, or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到各种等效的修改或替换，这些修改或替换都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions shall be covered by the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1, A method for alarming by remote camera shooting, which is characterized in that the method comprises:

the camera shooting device shoots a target person to obtain a target image and sends the target image to the cloud;

the cloud receives the target image sent by the camera equipment and generates high-level semantic description of the target image, wherein the high-level semantic description is used for performing high-level description on the target person in a text mode;

under the condition that the high-level semantic description comprises preset keywords, the cloud generates an alarm prompt and sends the alarm prompt to the terminal equipment;

and the terminal equipment receives the alarm prompt sent by the cloud.

2. The method of claim 1, wherein the cloud generates a high-level semantic description of the target image, comprising:

the cloud extracts position vectors of a plurality of target subjects including target characters from the target image;

the cloud extracts attitude vectors of a plurality of target subjects including target characters from the target image;

and the cloud end inputs the position vectors of the target subjects and the posture vectors of the target subjects into a semantic description model so as to generate high-level semantic description of the target image.

3. The method of claim 2, wherein the semantic description model comprises a low-level semantic unit and a high-level semantic unit, and wherein inputting the position vectors of the plurality of target subjects and the pose vectors of the plurality of target subjects into the semantic description model to generate a high-level semantic description of the target image comprises:

inputting the position vectors of the plurality of target subjects into the low-level semantic unit to obtain a low-level semantic description;

inputting the pose vectors of the plurality of target subjects and the low-level semantic description into the high-level semantic unit to obtain the high-level semantic description.

4. The method of any one of claims 1 to 3, , wherein the method further comprises:

the terminal equipment sends the call request to the camera equipment under the condition of receiving the alarm prompt sent by the cloud end;

under the condition that the terminal equipment is accessed to a network through a Wireless Local Area Network (WLAN), the terminal equipment establishes a video call with the camera equipment;

and under the condition that the terminal equipment is not accessed into the network through a Wireless Local Area Network (WLAN), the terminal equipment establishes a voice call with the camera equipment.

5. The method according to claim 4, wherein in a case where the terminal apparatus establishes a voice call with the image pickup apparatus, the method further comprises:

displaying a video call shortcut key on a voice call interface of the terminal equipment;

and under the condition that the video call shortcut key is triggered, the terminal equipment stops the voice call with the camera equipment and establishes the video call with the camera equipment.

6, A warning system by remote camera shooting, comprising:

the camera device is used for shooting a target person to obtain a target image and sending the target image to the cloud;

the cloud is used for receiving the target image sent by the camera equipment and generating high-level semantic description of the target image, wherein the high-level semantic description is used for performing high-level description on the target person in a text mode;

under the condition that the high-level semantic description comprises preset keywords, the cloud is further used for generating an alarm prompt and sending the alarm prompt to the terminal equipment;

the terminal equipment is used for receiving the alarm prompt sent by the cloud.

7. The system of claim 6, wherein the cloud is configured to generate a high-level semantic description of the target image, comprising:

the cloud is used for extracting position vectors of a plurality of target subjects including target characters from the target image;

the cloud is used for extracting attitude vectors of a plurality of target subjects including target characters from the target image;

the cloud is used for inputting the position vectors of the target subjects and the posture vectors of the target subjects into a semantic description model so as to generate high-level semantic description of the target image.

8. The system of claim 7, wherein the semantic description model comprises a low-level semantic unit and a high-level semantic unit, and wherein inputting the position vectors of the plurality of target subjects and the pose vectors of the plurality of target subjects into the semantic description model to generate a high-level semantic description of the target image comprises:

9. The system of any one of claims 6 to 8, , further comprising:

the terminal equipment is used for sending the call request to the camera equipment under the condition of receiving the alarm prompt sent by the cloud end;

10. The system according to claim 9, wherein in a case where the terminal apparatus establishes a voice call with the image pickup apparatus, the system further comprises: