CN120075566A

CN120075566A - Safety management camera module and method based on visual model

Info

Publication number: CN120075566A
Application number: CN202510148783.6A
Authority: CN
Inventors: 王波; 张腾予
Original assignee: Chengdu Baisnake Technology Co ltd
Current assignee: Chengdu Baisnake Technology Co ltd
Priority date: 2025-02-11
Filing date: 2025-02-11
Publication date: 2025-05-30

Abstract

The present invention belongs to the field of security monitoring, and specifically, is a security management camera module and method based on a visual model. The security management camera module comprises: an image acquisition module, a Doubao visual large model embedding module, Qwen2‑VL, a data processing and analysis module, and a communication module. The present invention continuously acquires images through the image acquisition module, and after in-depth analysis and processing by the Doubao visual large model embedding module and the data processing and analysis module, transmits information to the outside through the communication module, the storage module saves data, the power management module ensures power supply, and the alarm module promptly issues an alarm when necessary, thereby forming a complete set of intelligent security management and monitoring system. By combining with advanced visual large models, more accurate, intelligent and efficient security monitoring and management functions are achieved, and the false alarm and missed alarm rates are effectively reduced, helping security management personnel to better perform their duties and ensure the safety of the monitored area.

Description

Safety management camera module and method based on visual model

Technical Field

The invention belongs to the field of safety monitoring, and particularly relates to a safety management camera module and method based on a visual model.

Background

With the development of society and the continuous improvement of people's safety consciousness, safety management cameras are widely used in numerous places such as residential communities, business office areas, industrial parks, and the like. The traditional safety management camera can only realize a simple video image acquisition function, has limited capability of analyzing and judging the acquired image, generally relies on preset fixed rules to identify basic conditions such as a moving target, a specific-shape object and the like, is difficult to accurately cope with complex and changeable actual monitoring scenes, and is easy to cause false alarm, missing alarm and the like.

When facing massive monitoring video data, the traditional camera lacks effective data integration and depth mining capability, valuable safety related information cannot be timely extracted to assist management personnel in making efficient decisions, and the intelligent level and the practical effect of safety management are greatly limited. Therefore, a novel camera technology with stronger image analysis capability and capable of deeply mining data value and accurately performing security management judgment is urgently needed.

Therefore, the invention provides a safety management camera module and a safety management camera method based on a visual model.

Disclosure of Invention

In order to overcome the deficiencies of the prior art, at least one technical problem presented in the background art is solved.

The technical scheme adopted by the invention for solving the technical problems is that the safety management camera module based on the visual model is characterized by comprising the following components:

The image acquisition module consists of a high-definition optical lens and an image sensor component and is responsible for acquiring real-time video image information in a monitoring area;

The bean bag visual large model embedding module is used for receiving the video image data from the image acquisition module, and the visual large model can accurately identify and classify objects, figures and behavior actions in the image;

Qwen2-VL, recognizing complex scene object relation, handwriting and multi-language image text, having excellent visual reasoning capability, solving problems by means of graph analysis, understanding long videos, supporting real-time conversations for multi-application, supporting multi-language convenience for global users, processing images with arbitrary resolution, integrating multi-dimensional information by means of innovative architecture, and expanding application capability of question-beating and answering;

The data processing and analyzing module cooperates with the big bean bag visual model embedding module to further sort, judge and integrate the output result of the big bean bag visual model embedding module, and the big bean bag visual model embedding module is used for comparing real-time and historical monitoring data through an algorithm, mining safety trend and abnormality, generating an evaluation report and assisting a safety management decision;

The communication module has the supporting capability for various communication protocols such as Wi-Fi, ethernet and 4G/5G, can transmit video image data collected by a camera, analysis results of the bean bag visual large model and various information of security risk assessment reports generated by the data processing and analysis module to a remote monitoring management platform or a mobile terminal of security personnel in real time, ensures the timely transmission of security information, and is convenient for the management personnel to remotely control the condition of a monitoring area and respond quickly;

The power management module provides stable and reliable power supply for the whole camera system, can be connected with a mains supply, is internally provided with a standby power supply such as a lithium battery, and has the functions of power monitoring, intelligent switching and energy-saving control;

The storage module is provided with a large-capacity storage medium, such as a solid state disk, and is mainly used for storing original video image data acquired by a camera, and key information obtained after processing of each link and the content of an analysis result;

The alarm module is cooperated with the data processing and analyzing module to monitor the safety condition of the monitoring area, triggers various alarm modes, deterres potential dangerous personnel by sending out sound and light signals, and pushes information, voice prompts and the like to the mobile terminal of the manager.

Preferably, the image acquisition module has functions of adjusting focal length, aperture and the like, can adapt to the acquisition requirements of clear images under different distances and illumination environment conditions, and acquired image data is transmitted to a subsequent module in a digital signal form in real time for processing.

Preferably, the bean bag visual large model embedding module is internally provided with a pre-trained bean bag visual large model, the model is formed by performing depth training based on massive image data of different scenes, various objects and characters, has strong image feature extraction, semantic understanding and pattern recognition capability, and can accurately recognize and classify the objects, characters and behavior actions in the image after receiving the video image data from the image acquisition module, for example, accurately distinguish normal pedestrians, workers or suspicious people, recognize specific dangerous objects and abnormal scene arrangement, such as blocked fire control channels and the like, and analyze and judge the behavior track and the action gesture of the characters, and judge whether abnormal behaviors such as climbing and fighting exist or not.

Preferably, qwen-VL comprises:

the method has strong recognition capability, can accurately recognize a plurality of objects and relations thereof in a complex scene, and can recognize handwritten texts and multilingual image texts including most European languages, japanese languages, korean languages and Arabic languages;

the visual reasoning capability is excellent, the complex mathematical problem can be solved through chart analysis, information can be extracted from the real world image and the chart, and the actual problem can be solved by better following the instruction;

The long video understanding and real-time dialogue can understand the video content for more than 20 minutes, can continuously provide information and support in the real-time dialogue, and can be applied to question-answering, dialogue, content creation and the like of the video;

The multi-language support not only supports common English and Chinese, but also supports understanding of image texts in multiple languages, thereby being convenient for global users to use;

the architecture is innovative, a serial structure of vit plus qwen is adopted, the native dynamic resolution and multi-mode rotation position embedding technology is supported, the image input with any resolution can be processed, and multi-dimensional position information, question taking and answering, AI picture generation, telephone call and message sending can be simultaneously captured and integrated.

Preferably, the data processing and analyzing module works in cooperation with the large bean bag visual model embedding module, on one hand, further data arrangement and logic judgment are carried out on the recognition and analysis results output by the large model, and related information acquired at different moments and different angles is associated and integrated, on the other hand, the built-in algorithm is used for carrying out comparison analysis on the real-time monitoring data and the historical monitoring data, and potential safety trend and abnormal change conditions, such as abnormal aggregation frequency change of recent personnel in a certain area, are mined, and a safety risk assessment report is generated according to the safety risk assessment report, so that detailed data support is provided for subsequent safety management decisions.

Preferably, the power management module is responsible for providing stable and reliable power supply for the whole camera system, can be connected with a mains supply and is internally provided with a standby power supply, such as a lithium battery, has the functions of power monitoring, intelligent switching and energy-saving control, can automatically switch to the standby power supply to continuously maintain the normal work of the camera when the mains supply is powered off, and can dynamically adjust the power supply of each module according to actual monitoring requirements

Preferably, the storage module is provided with a large-capacity storage medium, such as a solid state disk, and is used for storing collected original video image data, processed key information and analysis result content, supporting data classification storage according to various modes of time and event types, facilitating subsequent query, playback and data tracing operation, and the stored data can be backed up periodically to prevent loss.

The use method of the safety management camera based on the visual model is characterized by comprising the following steps of:

S1, installing and initializing, namely installing a camera at a proper position needing safety monitoring, such as a gateway of a building, a corridor passageway and a key channel area of a park, ensuring that a lens field of an image acquisition module covers a target monitoring area, connecting a mains supply, starting the camera, performing self-checking by a power management module at the moment and normally supplying power to each module, starting an initializing program by a camera system, automatically connecting a communication module to a preset network, for example, connecting the communication module to a local area network or accessing the Internet through a 4G/5G network, completing formatting preparation work by a storage module, waiting for data storage, and loading pre-training model parameters by a bean bag vision large model embedding module to enter a ready state;

S2, image acquisition and analysis, wherein the image acquisition module continuously acquires video images of a monitoring area according to a set frame rate, such as 25 frames per second, and transmits real-time image data to the bean bag visual large model embedding module, after receiving the images, the visual large model embedding module performs operations of feature extraction, object identification and behavior analysis on each frame of images, such as identifying personnel identity in a picture, judging whether walking directions and behavior actions of personnel are in compliance or not in a mode of comparing with a pre-stored authorized personnel image database, and simultaneously identifying various objects and states thereof in a scene, such as fire facilities and vehicles, and outputting corresponding analysis results to the data processing and analysis module;

S3, after the data processing and analyzing module collects analysis results of the vision large model, on one hand, integrating cameras at different angles at the same moment, collecting related data, and constructing a complete monitoring scene view, on the other hand, combining historical monitoring data, judging the safety state of a current monitoring area through a built-in risk assessment algorithm, such as calculating the probability of abnormal behaviors of current personnel and the level of potential safety hazards in the environment, and generating a safety risk assessment report;

S4, information transmission and remote monitoring, wherein the communication module sends collected original video image data, analysis results of a visual large model and generated security risk assessment reports to a remote monitoring management platform and a mobile terminal of security personnel in real time according to a set period, such as every 1 minute or aiming at an urgent high risk event;

S5, alarm triggering and response are carried out, when the data processing and analyzing module judges that a safety event reaching a preset alarm threshold occurs in the monitoring area, for example, an unauthorized person is detected to try to break into a limit area and a fire smoke situation occurs, the alarm module is immediately started;

S6, data storage and management are carried out, the storage module continuously stores the original video image data acquired by the image acquisition module, classified archiving is carried out according to time sequence and event labels, such as each alarm event and daily inspection period are taken as labels, subsequent query and playback are facilitated, meanwhile, the processed analysis result and safety risk assessment report key information are stored, data statistics and trend analysis are facilitated for management personnel, the data are taken as the basis of safety management decision, and the stored data are backed up to external storage equipment or cloud end periodically, so that safety and integrity of the data are ensured.

The beneficial effects of the invention are as follows:

1. according to the safety management camera module and the safety management camera method based on the visual model, by means of the strong image understanding capability of the big bean bag visual model, various elements in a monitored scene can be identified with high precision, misjudgment conditions caused by factors such as environmental interference and object similarity are greatly reduced, safety related key information can be accurately captured under different conditions such as daytime, night, complex indoor and outdoor scenes and the like, and the effectiveness of safety monitoring is improved.

2. According to the safety management camera module and the safety management camera method based on the visual model, through the comprehensive application of the data processing and analyzing module to the output result and the historical data of the visual large model, the current safety condition can be known, potential safety hazard trends can be dug, the generated safety risk assessment report can assist management staff to make a coping strategy in advance, passive response is changed into active prevention, and the prospective and scientificity of overall safety management are improved.

3. According to the safety management camera module and the safety management camera method based on the visual model, the communication module ensures that monitoring data and analysis results can be transmitted to the remote terminal in real time, a manager can master the situation and give instructions at any time and any place without monitoring the scene, convenience and timeliness of safety management work are greatly improved, and the safety management camera module and the safety management camera method based on the visual model are particularly suitable for large-area and multi-area centralized safety management scenes.

4. According to the safety management camera module and the safety management camera method based on the visual model, the power management module ensures that the camera stably works in various power supply environments, monitoring blank caused by power failure is avoided, and the storage module has convenient data storage and query functions, is beneficial to follow-up event disc copying, evidence searching and other works, and enhances the practicability and reliability of the whole safety management system.

5. According to the safety management camera module and the safety management camera method based on the visual model, the alarm module can rapidly start the corresponding alarm mode according to accurate safety event judgment, and notify related personnel at the first time, so that safety risks are controlled to be in the minimum range, further expansion of safety accidents is avoided, and safety of personnel and property in a monitoring area is guaranteed.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a block diagram of a camera in the present invention;

fig. 2 is a flowchart of a method of using a camera according to the present invention.

Detailed Description

The invention is further described in connection with the following detailed description in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.

As shown in fig. 1 and fig. 2, a security management camera module based on a visual model according to an embodiment of the present invention is characterized in that the security management camera module includes:

The image acquisition module has the functions of adjusting focal length, aperture and the like, can adapt to the acquisition requirements of clear images under different distances and illumination environment conditions, and the acquired image data is transmitted to the subsequent module in a digital signal form in real time for processing.

The large bean bag visual model embedding module is internally provided with a large pre-trained large bean bag visual model, the model is formed by performing depth training based on massive image data of different scenes, various objects and characters, has strong image feature extraction, semantic understanding and mode recognition capability, and can accurately recognize and classify the objects, characters and behavior actions in the images after receiving the video image data from the image acquisition module, for example, accurately distinguish normal pedestrians, workers or suspicious people, recognize specific dangerous objects and abnormal scene arrangement, such as blocked fire control channels and the like, and analyze and judge the behavior track and the behavior gesture of the characters, and judge whether abnormal behaviors exist, such as climbing and fighting.

Qwen2-VL comprises:

The data processing and analyzing module works in cooperation with the large bean bag visual model embedding module, performs further data arrangement and logic judgment on recognition and analysis results output by the large model, performs association and integration on related information acquired at different moments and different angles, performs comparison analysis on real-time monitoring data and historical monitoring data through a built-in algorithm, and digs potential safety trend and abnormal change conditions, such as abnormal aggregation frequency change of recent personnel in a certain area, so as to generate a safety risk assessment report according to the safety trend and abnormal aggregation frequency change, and provides detailed data support for subsequent safety management decisions.

The power management module is responsible for providing stable and reliable power supply for the whole camera system, can be connected to a mains supply and is internally provided with a standby power supply, such as a lithium battery, has the functions of power monitoring, intelligent switching and energy-saving control, can automatically switch to the standby power supply to continuously maintain the normal work of the camera when the mains supply is powered off, and can dynamically adjust the power supply of each module according to the actual monitoring requirements

The storage module is provided with a large-capacity storage medium, such as a solid state disk, and is used for storing collected original video image data, processed key information and analysis result content, supporting data classification storage according to various modes of time and event types, facilitating subsequent query, playback and data tracing operation, and the stored data can be backed up periodically to prevent loss.

The modules of the whole camera system work cooperatively, the image acquisition module continuously acquires images, after the deep analysis processing of the bean bag visual large model embedding module and the data processing and analyzing module, the communication module is used for transmitting information outwards, the storage module is used for data storage, the power supply management module is used for guaranteeing power supply, and the alarm module is used for giving an alarm in time when necessary, so that a complete intelligent safety management monitoring system is formed.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A security management camera module based on a visual model, characterized in that: the security management camera module comprises:

The image acquisition module, which consists of high-definition optical lenses and image sensor components, is responsible for collecting real-time video image information within the monitoring area;

The Doubao visual model embedding module receives the video image data from the image acquisition module. The visual model can accurately identify and classify the objects, people, and behaviors in the image.

Qwen2-VL, which can recognize complex scene object relationships, handwriting, and multi-language image text, has excellent visual reasoning ability and can solve problems through chart analysis. It can understand long videos and support real-time conversations for multiple applications. It supports multiple languages to facilitate global users. With its innovative architecture, it can process images of any resolution, integrate multi-dimensional information, and expand its application capabilities by taking pictures to answer questions.

The data processing and analysis module, in collaboration with the Doubao visual large model embedding module, not only further organizes, judges and integrates the output results of the module, but also compares the real-time and historical monitoring data through algorithms, mines security trends and anomalies, generates evaluation reports, and assists in security management decision-making;

The communication module supports multiple communication protocols, such as Wi-Fi, Ethernet, and 4G/5G. It can transmit the video image data collected by the camera, the analysis results of the Doubao visual model, and the security risk assessment report generated by the data processing and analysis module to the remote monitoring management platform or the mobile terminal of the security personnel in real time, ensuring the timely transmission of security information and facilitating the management personnel to remotely control the status of the monitoring area and respond quickly;

The power management module provides a stable and reliable power supply for the entire camera system. It can be connected to the mains power supply and has a built-in backup power supply such as a lithium battery. It has power monitoring, intelligent switching and energy-saving control functions.

The storage module is equipped with a large-capacity storage medium, such as a solid-state hard disk, which is mainly used to store the original video image data collected by the camera, as well as the key information and analysis results obtained after processing in various links;

The alarm module works with the data processing and analysis module to monitor the safety status of the surveillance area; triggers multiple alarm modes, deters potentially dangerous persons by emitting sound and light signals, and pushes messages and voice prompts to the manager's mobile terminal.

2. According to the security management camera module based on the visual model described in claim 1, it is characterized in that the image acquisition module has functions such as adjustable focal length and aperture, which can adapt to the needs of collecting clear images under different distances and lighting conditions, and the collected image data is transmitted in real time in the form of digital signals to subsequent modules for processing.

3. According to claim 2, a security management camera module based on a visual model is characterized in that: the Doubao visual big model embedding module has a pre-trained Doubao visual big model built in, and the model is deeply trained based on a large amount of different scenes, various objects, and character behavior image data, and has powerful image feature extraction, semantic understanding and pattern recognition capabilities; after receiving the video image data from the image acquisition module, the visual big model can accurately identify and classify objects, people, and behavioral actions in the image, for example, accurately distinguish whether they are normal pedestrians, staff or suspicious persons, identify specific dangerous items, abnormal scene arrangements, such as blocked fire passages, etc., and at the same time, it can also analyze and judge the character's behavior trajectory and action posture to determine whether there is abnormal behavior, such as climbing or fighting.

4. A security management camera module based on a visual model according to claim 3, characterized in that: Qwen2-VL comprises:

Powerful recognition capabilities, capable of accurately identifying multiple objects and their relationships in complex scenes, as well as handwritten text and multilingual image text including most European languages, Japanese, Korean, and Arabic;

Excellent visual reasoning skills, able to solve complex math problems through graphical analysis, able to extract information from real-world images and diagrams, and better follow instructions to solve real-world problems;

Long video comprehension and real-time conversation: It can understand video content of more than 20 minutes and continuously provide information and support in real-time conversation. It can be applied to video Q&A, conversation and content creation, etc.

Multi-language support: In addition to the common English and Chinese, it also supports understanding image text in multiple languages, making it convenient for users around the world to use;

The architecture is innovative, adopting the series structure of VIT and QWEN2, supporting native dynamic resolution and multi-modal rotation position embedding technology, and can process image input of any resolution. It can simultaneously capture and integrate multi-dimensional location information, take photos to answer questions, generate AI images, make phone calls, and send messages.

5. According to claim 4, a security management camera module based on a visual model is characterized in that: the data processing and analysis module cooperates with the Doubao visual large model embedding module to, on the one hand, further organize the data and make logical judgments on the recognition and analysis results output by the large model, and associate and integrate the relevant information collected at different times and angles; on the other hand, the real-time monitoring data is compared and analyzed with the historical monitoring data through the built-in algorithm to explore potential security trends and abnormal changes, such as the recent changes in the frequency of abnormal gatherings of people in a certain area, and generate a security risk assessment report based on this, providing detailed data support for subsequent security management decisions.

6. According to claim 5, a security management camera module based on a visual model is characterized in that: a power management module is responsible for providing a stable and reliable power supply for the entire camera system, can be connected to the mains power supply and has a built-in backup power supply, such as a lithium battery, and has power monitoring, intelligent switching and energy-saving control functions. When the mains power is cut off, it can automatically switch to the backup power supply to continue to maintain the normal operation of the camera, and at the same time can dynamically adjust the power supply of each module according to actual monitoring needs.

7. According to claim 6, a security management camera module based on a visual model is characterized in that: the storage module is equipped with a large-capacity storage medium, such as a solid-state hard disk, which is used to store the collected original video image data and the processed key information and analysis results, and supports data classification and storage in multiple ways according to time and event type, which is convenient for subsequent query, playback and data tracing operations, and the stored data can be backed up regularly to prevent loss.

8. According to claim 7, a security management camera module based on a visual model is characterized in that: an alarm module is connected to the data processing and analysis module. When a security incident reaching a preset danger level is determined in the monitoring area based on the analysis and judgment of the Doubao visual large model and the comprehensive data processing results, such as illegal intrusion or fire hazard, the module can trigger a variety of alarm methods, including sending sound and light alarm signals to deter potentially dangerous personnel and sending alarm notifications to the mobile terminals of relevant managers, such as push messages and voice prompts, to ensure that security incidents can be handled in a timely manner.

9. A method for using a visual model-based security management camera, using a visual model-based security management camera module according to any one of claims 1 to 8, characterized in that the method comprises the following steps:

S1. Installation and initialization: Install the camera at a suitable location where security monitoring is required, such as the entrance and exit of a building, corridors, and key channel areas of a park, to ensure that the lens field of view of the image acquisition module covers the target monitoring area; connect the AC power supply and turn on the camera. At this time, the power management module performs a self-check and supplies power to each module normally. The camera system starts the initialization program, and the communication module automatically connects to the preset network, such as connecting to the local area network or accessing the Internet through a 4G/5G network. The storage module completes formatting preparations and waits for data storage. The Doubao visual large model embedding module loads the pre-trained model parameters and enters the ready state;

S2, image acquisition and analysis. The image acquisition module continuously acquires video images of the monitored area at a set frame rate, such as 25 frames per second, and transmits real-time image data to the Doubao visual large model embedding module. After receiving the image, the visual large model embedding module performs feature extraction, object recognition, and behavior analysis on each frame of the image. For example, it identifies the identity of the person in the picture, and determines whether the walking direction and behavior of the person are compliant by comparing with the pre-stored authorized personnel image database. At the same time, it also identifies various objects in the scene and their states, such as fire-fighting facilities and vehicles, and outputs the corresponding analysis results to the data processing and analysis module.

S3. Data processing and comprehensive judgment. After the data processing and analysis module collects the analysis results of the large visual model, it integrates the relevant data collected by cameras at different angles at the same time to build a complete monitoring scene view. On the other hand, it combines historical monitoring data and uses the built-in risk assessment algorithm to judge the safety status of the current monitoring area. For example, it calculates the probability of abnormal behavior of the current personnel and the level of safety hazards in the environment, and generates a safety risk assessment report. For example, if a large number of unfamiliar people gather in a certain area in a short period of time and behave abnormally, combined with the historical data of normal traffic in the area, it is judged as a high-risk event and marked as a key concern.

S4, information transmission and remote monitoring. The communication module sends the collected original video image data, the analysis results of the visual large model and the generated security risk assessment report to the remote monitoring management platform and the mobile terminal of the security personnel in real time according to the set cycle, such as every 1 minute or for urgent high-risk events; the security personnel can view the monitoring screen, analysis results and risk assessment of each camera in real time through the mobile phone APP or the interface of the monitoring management platform, remotely supervise the monitoring area, and remotely control the camera to zoom and turn in order to obtain clearer and more accurate image information if suspicious situations are found;

S5, alarm triggering and response, when the data processing and analysis module determines that a security event that reaches the preset alarm threshold occurs in the monitoring area, such as detecting an unauthorized person trying to break into a restricted area or a fire and smoke situation, the alarm module is immediately activated; the sound and light alarm sends out a strong sound and light signal to deter potentially dangerous persons, and at the same time, an alarm notification containing detailed information of the event, such as the location of the event and the type of event, is sent to the mobile terminal of the security personnel and the relevant person in charge through the communication module. After receiving the notification, the relevant personnel can quickly rush to the scene to deal with it, or remotely command and dispatch to take corresponding response measures;

S6. Data storage and management. The storage module continuously stores the original video image data collected by the image acquisition module, and classifies and archives them according to chronological order and event labels, such as each alarm event and daily patrol period, to facilitate subsequent query and playback; it also stores processed analysis results and key information of security risk assessment reports to facilitate management personnel to conduct data statistics, trend analysis, and as a basis for security management decisions. The stored data is regularly backed up to external storage devices or the cloud to ensure data security and integrity.