Detailed Description
In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application provides a label generation method for improving the calculation speed and accuracy of a real-time label, which is characterized in that real-time data and historical data of a main body are stored in different partitions of a same column storage area, and when the real-time label of the main body is calculated, the real-time data and part of the historical data corresponding to the main body are used for determining, so that the accuracy of the real-time label is improved. Because the real-time data and the historical data are stored in the same column-type storage area, the reading efficiency is improved when the data are read, and the calculation speed of the real-time tag is further improved.
The concrete expression form of the main body can be determined according to the requirements of the practical application scene, for example, the main body can be a person, an object and the like. When the main body is a person, the corresponding data is acquired under the premise of user authorization.
It can be appreciated that before using the technical solutions disclosed in the embodiments of the present application, the user should be informed and authorized of the type, the usage range, the usage scenario, etc. of the personal information related to the present application in an appropriate manner according to the relevant laws and regulations.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
In order to facilitate understanding of the technical solution provided by the embodiments of the present application, the following description will be given with reference to the accompanying drawings.
Referring to fig. 1, the method for generating a label according to the embodiment of the present application may be performed by a label generating device, where the label generating device may be a server, an electronic device, or other devices, and is not limited herein. The server may be a cloud server or a server cluster, and other devices with storage and calculation functions. The electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle-mounted terminal, a wearable electronic device, an integrated machine, an intelligent home device, and other devices with communication functions, and may also be a virtual machine or a simulator simulation device. As shown in fig. 1, the method may include the steps of:
and S101, responding to the acquired real-time data of the first main body, and generating a first request, wherein the first request is used for requesting to generate the real-time label of the first main body.
In this embodiment, when the tag generating device acquires a piece of real-time data of the first body, a first request is generated, so that the real-time tag of the first body is generated by triggering the first request. That is, the tag generating apparatus may detect real-time data of each subject in real time, and when the real-time data of a certain subject is acquired, trigger to generate a real-time tag of the subject.
Optionally, in response to acquiring the real-time data of the first subject, acquiring a subject identifier of the first subject, adding the identifier of the first subject to a subject identifier queue, and generating a first request according to the subject identifier queue. The main body expression queue is used for triggering the calculation of the real-time tag. That is, when the tag generating apparatus acquires real-time data of the first subject, a subject identification of the first subject, which is used to uniquely indicate the first subject, is determined by some information in the real-time data. After the subject identifier of the first subject is acquired, the subject identifier is added to a subject identifier queue. When the tag generating device monitors that the main body identification queue has the newly added main body identification, the tag generating device indicates that the newly added real-time data exists, and generates a first request according to the main body identification queue. Wherein, the first request may include a body identifier of the first body. That is, the present embodiment triggers the calculation of the real-time tag by collecting the subject identification.
Alternatively, considering that if each piece of real-time data triggers the calculation of a real-time tag, the device resources will be occupied, based on this, the present embodiment triggers the calculation of a real-time tag after accumulating a small batch of real-time data, so as to improve query rate per second (QPS). Specifically, the first request is generated in response to acquiring real-time data of the first subject and the request time reaching a preset time. The first request comprises a main body identifier corresponding to any real-time data acquired before a preset time. Or generating the first request in response to the number of the first bodies of the acquired real-time data reaching the preset number and the request time not reaching the preset time. The first request comprises a preset number of main body identifiers of main bodies. The preset time and the preset number can be set according to the actual application situation, and this embodiment is not described herein.
That is, the subject identifiers accumulated in the preset time are requested in batches or the subject identifiers accumulated to the preset number in the preset time are requested in batches. For example, considering that the timeliness of the real-time tag is higher, the preset time of the device is 200 milliseconds, the preset number is 100, if 100 main body identifications are accumulated within 200 milliseconds, the device is used for batch request, if 100 main body identifications are not accumulated within 200 milliseconds, the device is used for batch request after the time, and the pressure for accessing the storage space is reduced to the minimum under the condition of ensuring the timeliness.
S102, acquiring real-time data of a first main body from the real-time storage partition according to a first request and acquiring first historical data of the first main body from the historical storage partition.
In this embodiment, the storage engine for storing the data of the main body is divided into a real-time storage partition and a history storage partition, so that the real-time data and the history data of the main body are stored using the real-time storage partition. The tag generation device obtains implementation data of the first main body from the real-time storage partition according to the first request and obtains first historical data of the first main body from the historical storage partition. The time of the real-time data stored in the real-time storage partition is set according to actual conditions, for example, the real-time storage partition is used for storing data generated in the current day.
Optionally, the storage engine is a columnar storage engine, and the data can be stored by the columnar storage engine, so that the subsequent query speed can be improved, and the calculation efficiency of the real-time tag can be improved. For example, if the columnar storage engine is clickhouse, then a clickhouse table is partitioned into a real-time storage partition and a history storage partition.
Optionally, in a big data scenario, the acquired data is generally stored in a distributed storage manner, so as to ensure that the data of the same main body at different moments are stored on the same storage node, in this embodiment, the data of each main body is stored in a bucket manner, specifically, in response to acquiring real-time data of a certain main body, a main body identifier corresponding to the implementation data is determined, a target real-time storage partition corresponding to the main body is determined according to the main body identifier, and the real-time data is stored in the target real-time storage partition. The method comprises the steps of storing real-time data into a first real-time storage partition in response to a main body identifier corresponding to the real-time data being a main body identifier of a first main body, and storing the real-time data into a second real-time storage partition in response to the main body identifier corresponding to the real-time data being a main body identifier of a second main body.
Similarly, for the historical data, determining a main body identification corresponding to the historical data, determining a target historical storage partition according to the main body identification, and storing the historical data into the target historical storage partition.
For example, in the schematic diagram of the barrel division shown in fig. 2, for offline service data, a t+1 mode calculation is performed, after historical data is obtained, a body identifier (one id) corresponding to the historical data is calculated, and a storage area corresponding to the historical data is determined through hash calculation. This history data is stored in clickhouse local 0 when hash (one id)% 2=0, clickhouse local when hash (one id)% 2=1, clickhouse local when hash (one id)% 2=2, and so on. For real-time data, a main body identifier (one id) corresponding to the real-time data is also required to be calculated, and a storage area corresponding to the real-time data is determined through hash calculation. As can be seen from fig. 2, the historical data and the real-time data corresponding to the same body id will be stored in the same clickhouse local table. And for the real-time data, the corresponding id stream is determined according to the main body identification, so that the main body identification is added into the corresponding id stream, and further the calculation of the real-time label is triggered.
It should be noted that, in order to increase the calculation speed of the real-time tag, a batch request can be made on the premise of barrel division. I.e. the main body identifiers are gathered, and the inquiry clickhouse is requested at fixed time and fixed quantity, so that a batch of real-time labels of the main bodies are obtained at one time.
Optionally, when acquiring the real-time data of the first main body from the real-time storage partition according to the first request, determining the first real-time storage partition corresponding to the first main body according to the main body identifier in the first request, and acquiring the implementation data of the first main body from the first real-time storage partition. The first real-time storage partition is used for storing real-time data of the first main body. That is, real-time data of a certain subject is stored in its own corresponding real-time storage partition.
Optionally, when the history data of the first subject is obtained from the history storage partition according to the first request, determining the first history storage partition corresponding to the first subject according to the subject identifier in the first request, and obtaining the first history data of the first subject from the first history storage partition. The first history storage partition is used for storing first history data of a first main body. That is, history data of a certain subject is stored in its own corresponding history storage partition.
The first history data is determined according to a real-time tag calculation rule, for example, the real-time tag calculation rule is that the first history data is obtained for the first 6 days, and then the first history data is the first 6 days history data of the first main body in the first history storage partition. For another example, if the real-time tag calculation rule is to obtain the history data of the first 10 days, the first history data is the history data of the first 10 days before the first body in the first history storage partition.
And S103, determining the real-time label corresponding to the first main body according to the real-time data and the first historical data.
After the real-time data and the first historical data corresponding to the first main body are obtained, determining the real-time label corresponding to the first main body according to the real-time data and the first historical data.
Optionally, after determining the real-time tag corresponding to the first book body, storing the real-time tag in the real-time tag set.
Optionally, the embodiment further supports calculation of the offline label, specifically, in response to a triggering operation of a user, a second request is generated, the second request is used for requesting the offline label of the first main body, second historical data of the first main body are obtained from the historical storage partition according to the second request, and the offline label corresponding to the first main body is determined according to the second historical data. The second historical data is determined according to the offline tag calculation rule, and the second historical data may be the same as or different from the first historical data. For example, the offline label calculation rule is to determine the offline label according to the historical data of the first month, and the second historical data is the data generated by the first subject in the first month. After the offline tags are acquired, the offline tags may be stored into an offline tag set.
In addition, in order to avoid overlapping of the stored real-time tag and the off-line tag, the storage space is wasted, when the real-time tag is stored, the real-time tag of the first main body is compared with the off-line tag of the first main body, and in response to the difference between the real-time tag and the off-line tag, the real-time tag of the first main body is stored in the real-time tag set.
When the user needs to acquire the offline label of the first main body, the offline label of the first main body is read from the offline label set.
It can be seen that when real-time data of a certain subject is received, a first request is generated, the first request being for requesting generation of a real-time tag of the first subject. And acquiring the real-time data of the first main body from the real-time storage partition according to the first request and acquiring the first historical data of the first main body from the historical storage partition. And determining the real-time label corresponding to the first main body according to the real-time data and the first historical data. The real-time storage partition and the history storage partition are located in the same storage engine, so that the efficiency of reading the real-time data and the first history data is improved, and the calculation speed of the real-time tag is further increased. In addition, when the real-time label is calculated, calculation is performed according to the real-time data and some historical data, and the real-time label is not only based on the real-time data, so that the real-time label can reflect the characteristics of the first main body more, and the accuracy of the real-time label is improved.
For the convenience of understanding the present application, referring to the application scenario diagram shown in fig. 3, there are 3 subjects of real-time data in total, and the path of the real-time data of each subject is kafka-etl-kafka-data storage-real-time tag calculation. I.e. the body identity of the real-time data of each body is added to the id stream, triggering the tag calculation. Because the data storage adopts a columnar storage engine, the storage of real-time data of a plurality of subjects can be supported.
Based on the above method example, the embodiment of the present application provides a label generating apparatus, and will be described below with reference to the accompanying drawings.
Referring to fig. 4, which is a block diagram of a tag generating apparatus according to an embodiment of the present application, as shown in fig. 4, the apparatus 400 may include a generating unit 401, an acquiring unit 402, and a determining unit 403.
A generating unit 401, configured to generate, in response to acquiring real-time data of a first subject, a first request, where the first request is used to request a real-time tag of the first subject;
an obtaining unit 402, configured to obtain, according to the first request, real-time data of the first main body from a real-time storage partition and obtain, from a history storage partition, first history data of the first main body, where the real-time storage partition and the history storage partition are located in a same storage engine;
And the determining unit 403 is configured to determine a real-time tag corresponding to the first body according to the real-time data and the first history data.
In an optional implementation manner, the generating unit 401 is specifically configured to obtain a body identifier of a first body in response to obtaining real-time data of the first body, add the body identifier of the first body to a body identifier queue, and generate a first request according to the body identifier queue, where the body identifier queue is used to trigger calculation of a real-time tag.
In an optional implementation manner, the obtaining unit 402 is specifically configured to determine, according to a body identifier in the first request, a first real-time storage partition corresponding to the first body, where the first real-time storage partition is used to store real-time data of the first body, and obtain the real-time data of the first body from the first real-time storage partition.
In an alternative implementation, the apparatus further includes a storage unit;
the determining unit 403 is further configured to determine, in response to acquiring real-time data, a subject identifier corresponding to the real-time data;
the storage unit is used for responding to the main body identification corresponding to the real-time data as the main body identification of the first main body and storing the real-time data into the first real-time storage partition.
In an optional implementation manner, the generating unit 401 is specifically configured to generate the first request in response to acquiring real-time data of the first body and the request time reaching the preset time, or generate the first request in response to the number of the first bodies acquiring the real-time data reaching the preset number and the request time not reaching the preset time.
In an optional implementation manner, the obtaining unit 402 is specifically configured to determine, according to a body identifier in the first request, a first history storage partition corresponding to the first body, where the first history storage partition is used to store first history data of the first body, and obtain the first history data of the first body from the first history storage partition.
In an alternative implementation, the apparatus further includes a storage unit;
The determining unit 403 is further configured to determine, for any historical data, a body identifier corresponding to the historical data;
The storage unit is used for responding to the main body identification corresponding to the historical data as the main body identification of the first main body and storing the historical data into the first historical storage partition.
In an optional implementation manner, the generating unit 401 is further configured to generate, in response to a triggering operation of a user, a second request, where the second request is used to request an offline label of the first body;
The obtaining unit 402 is further configured to obtain, according to the second request, second history data of the first body from the history storage partition;
the determining unit 403 is further configured to determine an offline tag corresponding to the second body according to the second history data.
In an alternative implementation, the apparatus further includes a storage unit;
the storage unit is further configured to store the real-time tag corresponding to the first body in a real-time tag set.
In an optional implementation manner, the storage unit is specifically configured to store the real-time tag of the first body in the real-time tag set in response to the real-time tag corresponding to the first body being different from the offline tag.
It should be noted that, the implementation of each unit in this embodiment may refer to the related description in the foregoing method embodiment, and this embodiment is not described herein again.
The division of the units in the embodiment of the application is schematic, only one logic function is divided, and other division modes can be adopted in actual implementation. The functional units in the embodiment of the application can be integrated in one processing unit, or each unit can exist alone physically, or two or more units are integrated in one unit. For example, in the above embodiment, the processing unit and the transmitting unit may be the same unit or may be different units. The integrated units may be implemented in hardware or in software functional units.
Referring to fig. 5, a schematic diagram of an electronic device 500 suitable for use in implementing embodiments of the present application is shown. The terminal device in the embodiment of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (Personal DIGITAL ASSISTANT ), a PAD (Portable android device, a tablet computer), a PMP (Portable MEDIA PLAYER, a Portable multimedia player), a car-mounted terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV (television), a desktop computer, and the like. The electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.
As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
In general, devices may be connected to I/O interface 505 including input devices 506, including for example, touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc., output devices 507, including for example, liquid Crystal Displays (LCDs), speakers, vibrators, etc., storage devices 508, including for example, magnetic tape, hard disk, etc., and communication devices 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. The above-described functions defined in the method of the embodiment of the present application are performed when the computer program is executed by the processing means 501.
The electronic device provided by the embodiment of the present application belongs to the same inventive concept as the method provided by the above embodiment, and technical details not described in detail in the present embodiment can be seen in the above embodiment, and the present embodiment has the same beneficial effects as the above embodiment.
An embodiment of the present application provides a computer-readable medium, on which a computer program is stored, wherein the program, when being executed by a processor, implements a method as described in any of the above embodiments.
The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be included in the electronic device or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the method described above.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented in software or in hardware. The name of the unit/module is not limited to the unit itself in some cases, and, for example, the voice data acquisition module may also be described as a "data acquisition module".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system or device disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple, and the relevant points refer to the description of the method section.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.