[go: up one dir, main page]

CN120336628A - Table data processing method and device, computer equipment, storage medium, program product - Google Patents

Table data processing method and device, computer equipment, storage medium, program product

Info

Publication number
CN120336628A
CN120336628A CN202510407757.0A CN202510407757A CN120336628A CN 120336628 A CN120336628 A CN 120336628A CN 202510407757 A CN202510407757 A CN 202510407757A CN 120336628 A CN120336628 A CN 120336628A
Authority
CN
China
Prior art keywords
data
data row
row
target
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510407757.0A
Other languages
Chinese (zh)
Inventor
邓范鑫
王瑞
顾世浩
孙浩飞
何乐
贺冠瑞
张滨凯
陈宇豪
王翛
尹邵龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202510407757.0A priority Critical patent/CN120336628A/en
Publication of CN120336628A publication Critical patent/CN120336628A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供一种表格数据处理方法及装置、计算机设备、存储介质、程序产品。该方法,包括:响应于针对第一数据表中目标数据行的关联数据推荐请求,确定所述目标数据行的特征信息;根据特征信息,在第一数据表中获取与特征信息匹配的至少一个第一数据行和/或在与第一数据表关联的第二数据表中获取与特征信息匹配的至少一个第二数据行;根据与特征信息匹配的至少一个第一数据行和/或至少一个第二数据行,确定目标数据行的推荐关联数据;根据推荐关联数据,生成目标数据行的关联数据推荐结果。本公开提供的表格数据处理方法及装置、计算机设备、存储介质、程序产品,可以一定程度上提升数据推荐的覆盖率和准确率。

The present disclosure provides a table data processing method and apparatus, computer equipment, storage medium, and program product. The method includes: in response to a request for recommendation of associated data for a target data row in a first data table, determining characteristic information of the target data row; obtaining at least one first data row matching the characteristic information in the first data table and/or obtaining at least one second data row matching the characteristic information in a second data table associated with the first data table according to the characteristic information; determining the recommended associated data of the target data row according to at least one first data row and/or at least one second data row matching the characteristic information; generating an associated data recommendation result for the target data row according to the recommended associated data. The table data processing method and apparatus, computer equipment, storage medium, and program product provided by the present disclosure can improve the coverage and accuracy of data recommendations to a certain extent.

Description

Form data processing method and device, computer equipment, storage medium and program product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for processing table data, a computer device, a storage medium, and a program product.
Background
With the continuous development of internet technology, many works can be done intelligently. For example, the table may be created and maintained by collaborative office software.
In the related art, a certain field of a certain data row in a data table may be associated with one or more data rows of another data table, so that information of another data table associated therewith may be conveniently viewed in the data table.
However, the inventors of the present disclosure found that in the related art, when it is necessary to find the association record in the table, the operation is troublesome.
Disclosure of Invention
The present disclosure proposes a form data processing method and apparatus, a computer device, a storage medium, a program product to solve or partially solve the above-mentioned problems.
In a first aspect of the present disclosure, a method for processing table data is provided, including:
determining characteristic information of a target data row in a first data table in response to an associated data recommendation request for the target data row;
According to the characteristic information, at least one first data row matched with the characteristic information is acquired in the first data table, and/or at least one second data row matched with the characteristic information is acquired in a second data table associated with the first data table;
determining recommended association data of the target data row according to the at least one first data row and/or the at least one second data row matched with the characteristic information;
And generating an associated data recommendation result of the target data row according to the recommended associated data.
In a second aspect of the present disclosure, there is provided a form data processing apparatus including:
A first determining module configured to determine characteristic information of a target data row in a first data table in response to an associated data recommendation request for the target data row;
An acquisition module configured to acquire at least one first data line matched with the characteristic information in the first data table and/or acquire at least one second data line matched with the characteristic information in a second data table associated with the first data table according to the characteristic information;
a second determining module configured to determine recommended association data of the target data line according to the at least one first data line and/or the at least one second data line matched with the feature information;
And the generation module is configured to generate an associated data recommendation result of the target data row according to the recommended associated data.
In a third aspect of the present disclosure, there is provided a computer device comprising one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, the one or more programs comprising instructions for performing the method according to the first aspect.
In a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium containing a computer program which, when executed by one or more processors, causes the one or more processors to perform the method of the first aspect.
In a fifth aspect of the present disclosure, there is provided a computer program product comprising one or more computer programs which, when executed by one or more processors, implement the method of the first aspect.
According to the table data processing method, the table data processing device, the computer equipment, the storage medium and the program product, at least one first data row matched with the characteristic information is obtained in the first data table and/or at least one second data row matched with the characteristic information is obtained in the second data table associated with the first data table according to the characteristic information of the target data row, and automatic recommendation of data is realized based on the at least one first data row and/or the at least one second data row matched with the characteristic information, so that a user can conveniently find associated records.
Drawings
In order to more clearly illustrate the technical solutions of the present disclosure or related art, the drawings required for the embodiments or related art description will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
Fig. 1 shows a schematic diagram of an exemplary system provided by an embodiment of the present disclosure.
Fig. 2A shows a schematic diagram of an exemplary page according to an embodiment of the present disclosure.
Fig. 2B shows another schematic diagram of an exemplary page according to an embodiment of the present disclosure.
Fig. 3A shows a flow diagram of an exemplary tabular data processing method provided by an embodiment of the present disclosure.
Fig. 3B illustrates a flow diagram of a method of determining at least one first data line matching characteristic information in accordance with an embodiment of the present disclosure.
Fig. 3C illustrates a flow diagram of a method of determining at least one second data line matching characteristic information in accordance with an embodiment of the present disclosure.
Fig. 3D illustrates a flow diagram of a method of determining a third candidate data line in accordance with an embodiment of the present disclosure.
Fig. 3E shows a flow diagram of a method of determining a fourth line of data to be selected in accordance with an embodiment of the present disclosure.
Fig. 3F illustrates a flow diagram of a method of determining recommended association data corresponding to a target field of a target data row according to an embodiment of the disclosure.
FIG. 3G illustrates a flow diagram of another method of determining recommended association data corresponding to a target field of a target data row, according to an embodiment of the disclosure.
FIG. 3H illustrates a flow diagram of yet another method of determining recommendation associated data corresponding to a target field of a target data row in accordance with an embodiment of the present disclosure.
FIG. 3I illustrates a flow chart of yet another method of determining recommendation associated data corresponding to a target field of a target data line in accordance with an embodiment of the present disclosure.
Fig. 4A shows a schematic diagram of an exemplary page according to an embodiment of the present disclosure.
Fig. 4B shows another schematic diagram of an exemplary page according to an embodiment of the present disclosure.
Fig. 5 shows a hardware architecture diagram of an exemplary computer device provided by an embodiment of the present disclosure.
Fig. 6 shows a schematic diagram of an exemplary apparatus provided by an embodiment of the present disclosure.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
It will be appreciated that before using the technical solutions of the various embodiments in the disclosure, the user may be informed of the type of personal information involved, the range of use, the use scenario, etc. in an appropriate manner, and obtain the authorization of the user.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Therefore, the user can select whether to provide personal information to the software or hardware such as the electronic equipment, the application program, the server or the storage medium for executing the operation of the technical scheme according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative, and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
Fig. 1 shows a schematic diagram of an exemplary system 100 provided by an embodiment of the present disclosure.
As shown in fig. 1, the system 100 may be used to perform functions such as form creation and maintenance, and may include terminal devices 102A, 102B, a server 106, and a database server 108. The terminal device 102A may include a medium (e.g., a network) that provides a communication link with the server 106 and database server 108. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
Various Applications (APP) or software, such as a collaborative office-type application or software, an image processing-type application or software, a video conference-type application or software, a reading-type application or software, a video-type application or software, a social-type application or software, a payment-type application or software, a web browser, an instant messaging tool, etc., may be installed on the terminal devices 102A, 102B. In some embodiments, these applications or software may be used for table creation and maintenance, etc.
The terminal devices 102A, 102B may be hardware or software. When the terminal devices 102A, 102B are hardware, they may be various electronic or computer devices with display screens, including but not limited to smartphones, tablets, electronic book readers, MP3 players, laptop portable computers (Laptop), desktop computers (PC), and the like. When the terminal devices 102A, 102B are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.
The server 106 may be a server that provides various services, such as a background server that provides support for various applications or software displayed on the terminal devices 102A, 102B. Database server 108 may also be a database server that provides various services. It will be appreciated that the database server 108 may not be provided in the system 100 where the server 106 may implement the relevant functions of the database server 108.
The server 106 and database server 108 may be hardware or software. When they are hardware, they may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.
It should be noted that, the table data processing method provided by the embodiment of the present disclosure may be executed by the terminal devices 102A and 102B or interactively executed by each device in the system 100. It should be understood that the number of terminal devices, users, servers and database servers in fig. 1 is merely illustrative. There may be any number of terminal devices, users, servers, and database servers, as desired for implementation.
In some exemplary scenarios, the user 104A or the user 104B may create and maintain the table through collaborative office-like software or applications installed in the terminal device 102A or the terminal device 102B, respectively. Alternatively, the user 104A and the user 104B may co-maintain the same table, e.g., view, add, modify, delete rows of data in the table. Alternatively, the table may be a table for multidimensional management of one item, and may include a plurality of data tables in a plurality of dimensions, and thus the table may further include a plurality of data tables in a plurality of dimensions, each of which may be used for storing data of its corresponding dimension in correspondence with other data tables.
As an alternative embodiment, user 104A and/or user 104B may also associate a data row of the B data table for a field of a data row of the A data table in the table, such that user 104A and/or user 104B may view information in the A data table for the data row in the B data table associated with the field.
In some embodiments, the association may include a one-way association and a two-way association.
Unidirectional association may refer to certain fields in the table supporting a unidirectional association function, i.e., associating to a data row of the B data table in the a data table. Thus, after the association is completed, the entire row of information for the row of data can be clicked and viewed in the field of the A data table, and further jump to the B data table associated with the field for viewing. In some embodiments, the unidirectionally associated field may also be associated with a data row in the current data table (i.e., the A data table).
Bi-directional association may refer to certain fields in a table supporting bi-directional association functionality. Using the bi-directional association field, one or more rows of the B-table may be associated with the a-table, and the associated row of the B-table may then be automatically associated back with the corresponding row of the a-table. The user can directly check the data of the B data table in the A data table, jump to the B data table further, and jump back to the A data table from the B data table by one key.
It follows that when the user 104A and/or the user 104B performs an association operation in a current data table (e.g., a data table), it is necessary to find a data row to be associated in another data table (e.g., B data table) first, and then associate the two through the association operation. However, the inventors of the present disclosure have found that when the number of data rows of another data table (e.g., a B data table) is large, a user may need to constantly look at the data rows of the data table to find the data rows to be associated, which is not easy enough to operate.
As previously described, the user 104A and/or the user 104B may utilize collaborative office-like software or applications installed in the terminal devices 102A, 102B to create and maintain tables.
Illustratively, in an initial state, the user 104A may open the collaborative office class software or application using the terminal device 102A and further open the form to be maintained or create a new form.
In some embodiments, the table may include multiple data tables in multiple dimensions. For example, the table may be an order and merchandise statistics table, and thus may further include a data table for counting orders and a data table for counting merchandise, and other related data tables, and so forth.
Illustratively, the user 104A may further open one of the data tables for data maintenance, e.g., by triggering a tab corresponding to the data table in the table.
Fig. 2A shows a schematic diagram of an exemplary page 200, according to an embodiment of the present disclosure.
As shown in fig. 2A, the user 104A triggers the first data table such that the first data table 202 may be displayed in the page 200, and a plurality of first data lines 2022 may be further included in the first data table 202 (illustratively, 12 first data lines 2022 are shown in fig. 2A). Illustratively, the first data table 202 may be an order table for recording orders. In other words, an order is recorded for each data line in the order table. Thus, a data row in a data table may also be referred to as a record (record), while data rows of other data tables associated with the current data table (or a field thereof) may also be referred to as an associated record.
In some embodiments, as shown in fig. 2A, the first data table 202 may further include a plurality of fields, each of which may correspond to one of the attribute information of the first data row 2022. Taking an order table as an example, the order table may include a plurality of fields such as an order number, an order status, a responsible person, a service line, an order time, a commodity library, and the like, and corresponding field information is filled in each first data line 2022, so that a corresponding record may be performed on one order.
Optionally, as shown in fig. 2A, a field addition control 204 may be further included in the page 200, and the user 104A may enrich the contents of the first data table 202 by triggering the field addition control 204 to add a new field to the first data table 202.
In some embodiments, when a particular field supports a one-way association or two-way association function, user 104A may associate the particular field with a data row in another data table. Illustratively, as shown in fig. 2A, the goods library field may be a field that may implement a one-way association or a two-way association, so that the user 104A may perform an association operation on the goods library field in the data line. Alternatively, whether a field can be associated with other data tables, which data tables can be associated, and whether such association is unidirectional or bidirectional, can be preconfigured. Illustratively, the commodity library field may be associated with a commodity table such that a corresponding data row may be looked up from the commodity table for association into the commodity library field of the first data table 202.
Illustratively, the user 104A may generate the data association request by triggering (e.g., clicking or double-clicking) the target field 2024 (e.g., the cell corresponding to the merchandise library field of the target data row 2022A in fig. 2A) of the target data row 2022A of the first data table 202 in the page 200 (e.g., the third row of the data row 202 in fig. 2A, which may be the data row currently being edited by the user), so that the terminal device 102A may further display the first page 210 in response to the data association request, as shown in fig. 2B.
In some embodiments, as shown in fig. 2B, the first page 210 may be displayed in a floating manner on the page 200, and compared with the operation of jumping the page, the embodiment may enable the user 104A to have a look and feel of performing data maintenance for the first data table 202 in the page 200, which is more consistent with the user's habit, thereby improving the user experience.
As shown in fig. 2B, optionally, the second data table 212 associated with the first data table 202 may be further displayed in the first page 210, so that the user 104A may find the second data row 2122 of the second data table 212 that the user wants to associate with in the first page 210. Optionally, the second data table 212 is also associated with the target field 2024, and the association of the second data table 212 with the target field 2024 may be preconfigured on the field corresponding to the target field 2024, so that when the target field 2024 is triggered, the first page 210 may present the content of the second data table 212. Illustratively, the second data table 212 may be a commodity table for recording commodity information, which may be associated with the commodity library field of the first data table 202.
As shown in fig. 2B, the second data table 212 may include a plurality of second data lines 2122, and when the display range of the first page 210 is insufficient to display all of the second data lines 2122 of the second data table 212, in some embodiments, a scroll bar 2124 may be displayed on one side of the second data table 212. The scroll bar 2124 is used to scroll the second data table 212 in a first direction (e.g., a vertical direction of the first page 210) so that the user 104A can slide through other second data rows 2122 in the second data table 212 by moving the scroll bar 2124.
In some embodiments, as shown in fig. 2B, a search field 214 may also be included in the first page 210, such that the user 104A may search the second data table 212 for a second data row 2122 that the user wants to associate by entering a search term in the search field 214, thereby facilitating the user to find the second data row 2122 that the user wants to associate more quickly. Alternatively, the search bar 214 may be displayed on top of the first page 210 so that the user 104A may more easily view the search bar 214 and more conveniently use it.
However, the inventors of the present disclosure have found that when the number of second data rows 2122 of the second data table 212 is large, the first page 210 cannot fully reveal the entire contents of the second data table 212. In particular, when the number of second data rows 2122 of the second data table 212 is very large, it may be difficult for the user 104A to find the second data row that he wants to associate with. Even if search is performed using search bar 214, suitable results may not be searched for due to inaccuracy of the search term.
In some embodiments, rows of data that may be associated may be recommended to a user based on keyword retrieval, vector similarity calculation, data analysis, and the like. However, the inventors of the present disclosure have found that using any of the above methods (i.e., using a single recall strategy) for recommendation has certain limitations, such as insufficient coverage, low accuracy, insufficient personalization, and the like. Specifically, a single recall strategy is difficult to cover all possible associated scenes and can miss partial valuable associated records, the single recall strategy is possibly affected by factors such as data sparsity, semantic ambiguity, noise data and the like, so that a recommendation result is inaccurate, the single recall strategy is difficult to fully consider individual demands and scene differences of users, and the recommendation result is high in universality and insufficient in pertinence.
In view of this, the embodiments of the present disclosure provide a table data processing method, which may improve coverage rate and accuracy of data recommendation to a certain extent by fusing at least two recall strategies, so as to solve or partially solve the above-mentioned problems to a certain extent.
Fig. 3A shows a flow diagram of an exemplary tabular data processing method 300 provided by an embodiment of the present disclosure. The method 300 may be used to recommend associable data lines. Alternatively, the method 300 may be implemented by the terminal device 102A or 102B of FIG. 1 alone, or by the server 106 of FIG. 1 alone, or interactively by the devices in the system 100 of FIG. 1. As shown in fig. 3A, the method 300 may further include the following steps.
In step 302, characteristic information of a target data row 2022A may be determined in response to an associated data recommendation request for the target data row 2022A in the first data table 2022.
Alternatively, the triggering of the associated data recommendation request may be triggered by clicking on the recommendation control 216 of the first page 210, so that the terminal device 102A may know that the user 104A currently needs to use the function of the associated data of the recommendation target data line 2022A. In some embodiments, the first page 210 is displayed by triggering the target field 2024 of the target data line 2022A, and thus the recommendation control 216 may be used to recommend associated data for the target field 2024 of the target data line 2022A, such that the associated data recommendation request may be an associated data recommendation request for data recommendation for the target field 2024 of the target data line 2022A, such that the recommended associated data is obtained for association to the target field 2024 of the target data line 2022A. In some embodiments, as shown in fig. 2A, the association data recommendation request may also be generated when the user 104A triggers the target data line 2022A in the page 200 of fig. 2A (e.g., clicks or double clicks on the target data line 2022A), so that the recommended association data may be used to associate into the target data line 2022A (e.g., by selecting a corresponding field to associate the association data to the field after the recommended association data is obtained).
In this step, the feature information of the target data line 2022A may be various features for expressing the target data line 2022A, so that a data line associated with the target data line 2022A may be obtained based on the feature information. It will be appreciated that the feature information may be obtained from the target data line 2022A by any method of extracting features, as long as it is ensured that the feature information may express the characteristics of the target data line 2022A.
In some embodiments, the determining the characteristic information of the target data line may further include extracting a plurality of keywords from the target data line, and determining the characteristic information according to the plurality of keywords. Thus, by extracting a plurality of keywords of the target data line, the characteristics of the target data line 2022A can be used to characterize the target data line 2022A, and thus can be used as the characteristic information of the target data line 2022A.
It will be appreciated that the method of extracting keywords may be used in a variety of ways. In some embodiments, the target data line 2022A may be input into a keyword extraction model from which a plurality of keywords of the target data line 2022A are output. The keyword extraction model may be a machine learning model (e.g., a neural network model) and may be trained using the data lines of an existing data table, such that the keyword extraction model may extract a plurality of keywords from the target data line 2022A that more express characteristics of the target data line 2022A, thereby enhancing the subsequent data recommendation.
In some embodiments, the determining the feature information according to the plurality of keywords may further include generating a feature vector corresponding to the target data line according to the plurality of keywords, and then determining the feature information according to the plurality of keywords and the feature vector.
In this embodiment, after obtaining the plurality of keywords, the plurality of keywords may be further converted into feature vectors based on a vectorization technique, and then both the plurality of keywords and the feature vectors may be determined as feature information of the target data line 2022A. In this way, the recommendation is performed based on the two kinds of characteristic information, so that the coverage rate and/or the accuracy of the data can be improved.
It will be appreciated that vectorization techniques may be used in a variety of ways in which feature vectors may be generated based on keywords. Alternatively, the vectorization technique may be a one-hot (one-hot) encoding algorithm, a vocabulary mapping algorithm (e.g., word2Vec algorithm), a Word embedding (Word Embedding) algorithm, and so on.
In some embodiments, the feature vector may be generated directly based on the target data line 2022A instead of being generated according to a plurality of keywords, so that the feature vector may directly represent the feature information of the target data line 2022A without being affected by the keyword extraction algorithm, and further coverage rate and accuracy of the recommended data may be improved.
After determining the characteristic information, at step 304, at least one first data row 2022 matching the characteristic information may be acquired in the first data table 202 and/or at least one second data row 2122 matching the characteristic information may be acquired in a second data table 212 associated with the first data table 202, based on the characteristic information.
In this step, after the feature information of the target data line 2022A is obtained, the matching may be performed in the first data table 202 based on the feature information to obtain the first data line 2022 matching the feature information, and/or the matching may be performed in the second data table 212 based on the feature information to obtain the second data line 2122 matching the feature information, so that the feature information is utilized to perform the feature matching in the first data table 202 or the second data table 212 to obtain the first data line 2022 or the second data line 2122 which can be recommended, and automatic recommendation of the associated data (or the associated record) may be achieved, so that the user may find the data line which is wanted to be associated in the recommendation result.
In some embodiments, the matching may be performed in the first data table 202 to obtain the first data line 2022 matching the feature information and the matching may be performed in the second data table 212 to obtain the second data line 2122 matching the feature information based on the feature information at the same time, so that two recall strategies may be utilized to obtain the associated data for recommendation, and coverage and accuracy of the recommended data may be improved relative to employing a single recall strategy.
Specifically, since there are several first data rows 2022 in the first data table 202, and these first data rows 2022 similar to the target data row 2022A are also associated with the second data row 2122, if the second data row 2122 associated with the first data row 2022 similar to the target data row 2022A is used as the recommended data, the existing associated experience can be migrated to the target data row 2022A, so that the coverage of the recommended data is enlarged, and the recall rate is improved.
In some embodiments, as shown in fig. 3B, the step of obtaining, according to the feature information, at least one first data row matching the feature information in the first data table and/or obtaining, according to the feature information, at least one second data row matching the feature information in a second data table associated with the first data table may further include the steps of:
at step 3042, at least one first candidate data line matching the plurality of keywords is obtained in the first data table according to the plurality of keywords.
In this step, at least one first data line 2022 matching the plurality of keywords may be found in the database corresponding to the first data table 202 as the at least one first candidate data line corresponding to the target data line 2022A based on the plurality of keywords of the target data line 2022A. Alternatively, a first data line 2022 having a degree of matching higher than a degree of matching threshold may be used as the first candidate data line.
It will be appreciated that any keyword matching algorithm may be utilized to find the first candidate data line. Alternatively, a string matching algorithm (e.g., KMP (Knuth-Morris-Pratt) algorithm), a word search tree (Trie) algorithm, a multi-pattern matching algorithm (e.g., AC automaton algorithm), and the like may be employed.
At step 3044, at least one second candidate data line matching the feature vector is obtained in the first data table according to the feature vector.
In this step, based on the feature vector of the target data line 2022A, at least one first data line 2022 matching the feature vector may be found in the vector database corresponding to the first data table 202 as the at least one second candidate data line corresponding to the target data line 2022A.
Alternatively, the cosine similarity of the vector may be calculated as a method of calculating the similarity. Alternatively, a first data line 2022 having a similarity higher than a similarity threshold may be used as the second candidate data line.
In step 3046, the at least one first data line matching the characteristic information is determined from the at least one first candidate data line and the at least one second candidate data line.
In this step, the at least one first candidate data line and the at least one second candidate data line may be combined into one data set as the at least one first data line matched with the feature information, so that a range of the at least one first data line matched with the feature information may be enlarged, and a coverage rate of data may be improved.
In some embodiments, an intersection may be further taken from the at least one first candidate data line and the at least one second candidate data line as the at least one first data line matched with the feature information, so that the obtained first data line meets both a keyword matching degree requirement and a vector similarity requirement, and accuracy of data may be improved.
Moreover, since the filled information of the target data line 2022A may contain rich text information and key fields, keyword extraction may be supported, which is beneficial to acquiring recommendation data through keyword matching and vector similarity calculation later.
In some embodiments, as shown in fig. 3C, the step of obtaining, according to the feature information, at least one first data row matching the feature information in the first data table and/or obtaining, according to the feature information, at least one second data row matching the feature information in a second data table associated with the first data table may further include the steps of:
at step 3048, at least one third candidate data line matching the plurality of keywords is obtained in the second data table according to the plurality of keywords.
In this step, at least one second data line 2122 matching the plurality of keywords may be found in the database corresponding to the second data table 212 as the at least one third candidate data line corresponding to the target data line 2022A based on the plurality of keywords of the target data line 2022A. Optionally, a second data line 2122 with a degree of matching higher than a degree of matching threshold may be used as the third candidate data line.
It will be appreciated that any keyword matching algorithm may be utilized to find the third candidate data line. Alternatively, a string matching algorithm (e.g., KMP (Knuth-Morris-Pratt) algorithm), a word search tree (Trie) algorithm, a multi-pattern matching algorithm (e.g., AC automaton algorithm), and the like may be employed.
At step 3050, at least one fourth candidate data line matching the feature vector is obtained in the second data table according to the feature vector.
In this step, based on the feature vector of the target data line 2022A, at least one second data line 2122 matching the feature vector may be found in the vector database corresponding to the second data table 212 as the at least one fourth candidate data line corresponding to the target data line 2022A.
Alternatively, the cosine similarity of the vector may be calculated as a method of calculating the similarity. Optionally, a second data line 2122 having a similarity higher than a similarity threshold may be used as the fourth candidate data line.
At step 3052, the at least one second data line matching the characteristic information is determined from the at least one third candidate data line and the at least one fourth candidate data line.
In this step, the at least one third candidate data line and the at least one fourth candidate data line may be combined into one data set as the at least one second data line matched with the feature information, so that the range of the at least one second data line matched with the feature information may be enlarged, and the coverage rate of data may be improved.
In some embodiments, an intersection may be further taken from the at least one third candidate data line and the at least one fourth candidate data line as the at least one second data line matched with the feature information, so that the obtained second data line meets both the keyword matching degree requirement and the vector similarity requirement, and the accuracy of the data may be improved.
In addition to the foregoing recall strategy, in some embodiments, as shown in FIG. 3D, the method 300 provides a recall strategy based on recently edited records, and may further include the steps of:
at step 3054, at least one first edited data row in the first data sheet 202 that is generated within a first preset time period (e.g., within 3 days, within 5 days, within a week, etc.) from the current time is determined.
In this step, at the current time (or the current time), a first data line in the first data table 202 that has been edited within a first preset time period (e.g., within 3 days, within 5 days, within a week, etc.) may be determined as the first edited data line.
At step 3056, at least one data row in the second data table 212 associated with the target field (e.g., a commodity library field) of the at least one first edited data row is determined to be at least one third candidate data row.
In this step, after the at least one first edited data row is found, it may be determined whether the target field (e.g., the commodity library field) of the first edited data row is associated with a certain second data row 2122 in the second data table 212, and if the target field of the first edited data row is associated with a certain second data row 2122 in the second data table 212, the certain second data row 2122 may be used as the third candidate data row.
In this embodiment, since the recently edited data line of the user has a certain similarity with the current target data line 2022A in terms of subject or requirement, the recently popular or active record may be reflected, so that the recommendation is performed based on at least one third candidate data line, the coverage rate of the recommended data may be further improved, and the recall rate may be improved.
In some embodiments, as shown in FIG. 3E, the method 300 also provides a recall strategy based on similar associated records, and may further include the steps of:
In step 3058, in response to having associated a third data row in the target field of the target data row 2022A, at least one other data row in the first data table 202 that is associated with the third data row in the target field is determined.
In this embodiment, the target field may be a field capable of associating multiple records, and one record (i.e., the third data row) is already associated in the target field of the target data row 2022A, then other data rows that are associated with the record (i.e., the third data row) on the target field as the target data row 2022A may be found in the first data table 202.
It will be appreciated that the data table associated with the destination field is the second data table 212, and thus the data row already associated in the destination field of the destination data row 2022A also belongs to the second data table 212, where the data row corresponding to the third data row is named as the third data row for distinguishing from the second data row by name, and in fact, the third data row may also be the second data row 2122 in the second data table 212.
For clarity, taking the second data table 212 as an example of the commodity table, the third data row may be a second data row corresponding to the commodity a in the second data table 212. In this step, the first data line associated with the second data line corresponding to the commodity a may be searched in the first data table 202 as the other data line.
At step 3060, in response to the number of the at least one other data line being greater than the first number (e.g., 2, 3, 4, 5, 10, etc.), it is determined that the at least one other data line is also associated with at least one fourth data line in the target field.
Taking the second data table 212 as an example of the commodity table, in this step, when the number of the first data rows associated with the second data row corresponding to the commodity a is greater than the first number, which indicates that the frequency of occurrence of the association record of the second data row corresponding to the commodity a in the first data table 202 is higher, it may be further determined whether at least one fourth data row (for example, the second data row corresponding to the commodity B) is also associated with the other data rows associated with the second data row corresponding to the commodity a. Here, in order to distinguish from the second data line by name, the data line corresponding to the fourth data line is named as the fourth data line, and in practice, the fourth data line may be the second data line 2122 in the second data table 212.
At step 3062, it is determined, for each of the fourth data rows, whether the number of the at least one other data row with which the fourth data row is associated is greater than a second number (e.g., 2, 3, 4, 5, 10, etc.).
Taking the second data table 212 as an example of the commodity table, in this step, it may be determined whether the number of the second data rows associated with the commodity B in the other data rows is greater than the second number, so as to determine whether the second data row corresponding to the commodity B is high-frequency in the other data rows.
At step 3064, at least one of the fourth data lines associated with the at least one other data line having a number greater than the second number is determined to be at least one fourth candidate data line.
Taking the second data table 212 as an example of the commodity table, in this step, when the number of the other data rows associated with the second data row corresponding to the commodity B is greater than the second number, it is determined that the second data row corresponding to the commodity B is high-frequency appearing in the other data rows, so that the second data row can be used for recommendation to the user.
In this embodiment, the plurality of first data rows 2022 of the first data table 202 have all selected a plurality of association records (associated with a plurality of second data rows), so that enough co-occurrence records can be accumulated to form a reliable association rule, and the recall rate and the recommendation accuracy are improved.
Returning to FIG. 3A, after some candidate data lines are obtained, recommended association data for the target data line may be determined at step 306 based on the at least one first data line and/or the at least one second data line that match the characteristic information.
Optionally, after obtaining the at least one first data line and/or the at least one second data line matching the characteristic information, recommendation related data for the target data line 2022A may be determined based on these data lines.
In some embodiments, the associated data recommendation request includes a request to recommend associated data for a target field of the target data row;
as shown in fig. 3F, the determining the recommended association data of the target data line according to the at least one first data line and/or the at least one second data line matched with the feature information may further include the steps of:
At step 3066, at least one data row in the second data table associated with the target field of the at least one first data row that matches the characteristic information is determined to be at least one first data row to be selected.
In this step, a data row of a second data table associated with the target field of the at least one first data row matched with the feature information may be found, and then the data row is used as the first data row to be selected.
For the sake of clarity, taking the commodity table as an example, assuming that a plurality of first data rows matching the feature information are found in the first data table 202, a target field (e.g., a commodity library field) may be found therefrom in association with one or some second data rows in the commodity table, and then these second data rows with associated records are taken as the first candidate data rows.
At step 3068, the at least one second data line matching the characteristic information is determined as at least one second candidate data line.
In this step, the at least one second data line found from the second data table 212 that matches the characteristic information may be directly determined as the at least one second candidate data line.
In step 3070, the recommended association data corresponding to the target field of the target data row is determined according to the at least one first candidate data row and the at least one second candidate data row.
In this step, the at least one first candidate data row and the at least one second candidate data row are both second data rows 2122 in the second data table 212, and the data available for association may be recommended for the target field of the target data row 2022A based on them. Two recall strategies are fused in the data to be recommended, so that the coverage rate of the recommended data can be improved.
As mentioned above, in some embodiments, the step of obtaining the recommended association data corresponding to the target field of the target data row may further include determining the recommended association data corresponding to the target field of the target data row according to the at least one first candidate data row, the at least one second candidate data row, and the at least one third candidate data row, that is, using the at least one third candidate data row obtained based on the recall policy of the most recently edited record as the recommended data.
In some embodiments, the step of determining the recommended association data corresponding to the target field of the target data row according to the at least one first, the at least one second, and the at least one third candidate data row may further comprise the steps of:
According to the at least one first data line to be selected, the at least one second data line to be selected, the at least one third data line to be selected and the at least one fourth data line to be selected, the recommended association data corresponding to the target field of the target data line is determined, that is, the at least one fourth data line to be selected, which is obtained based on the recall strategy of similar associated records, is also used as the data to be recommended, so that the coverage rate and the recommendation accuracy of the recommended data can be further improved.
It can be understood that four recall policies in the embodiments of the present disclosure may be arranged and combined to obtain multiple fusion recall policies, and these fusion policies may all be used as recall policies in the embodiments of the present disclosure to obtain data to be recommended, so that compared with a single recall policy, a higher recall rate may be obtained, and the data coverage rate is improved, thereby being applicable to a wider application scenario.
In some embodiments, the at least one first line of data to be selected and the at least one second line of data to be selected are data obtained based on feature matching, both of which may be filtered in order to further improve data accuracy.
As shown in fig. 3G, the determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first candidate data row, the at least one second candidate data row, the at least one third candidate data row, and the at least one fourth candidate data row may further include the following steps:
At step 30702, the at least one first line of data to be selected is input to a first screening model to obtain at least one first line of screening data.
Optionally, the first filtering model is a Large Language Model (LLM), and filters the at least one first data line based on a first prompt term (prompt), where the first prompt term (prompt) includes task information for filtering the at least one first data line based on an association between the first data line and the first data table.
Since the first line of data to be selected is obtained based on the first line of data 2022 screened from the first data table 202, the association relationship between the first line of data to be selected and the first data table needs to be considered when screening the first line of data to be selected. In this embodiment, task information for screening the at least one first data line based on the association relationship between the first data line and the first data table is added to the first prompt word, so that the first screening model may screen the at least one first data line based on the association relationship between the first data line and the first data table, so as to obtain more accurate data to be recommended, that is, the at least one first screening data line.
At step 30704, the at least one second line of candidate data is input to a second screening model to obtain at least one second line of screening data.
The second screening model is a large language model, and screens the at least one second data line to be selected based on a second prompt word (prompt), wherein the second prompt word (prompt) comprises task information for screening the at least one second data line to be selected based on an association relationship between the first data table and the second data table.
Since the first data line to be selected is obtained from the second data line 2122 screened from the second data table 212 based on the characteristic information of the target data line 2022A, the association relationship between the first data table and the second data table needs to be considered when screening the second data line to be selected. In this embodiment, task information for screening the at least one second candidate data row based on the association relationship between the first data table and the second data table is added to the second prompt word, so that the second screening model may screen the at least one second candidate data row based on the association relationship between the first data table 202 and the second data table 212, thereby obtaining more accurate to-be-recommended data, that is, the at least one second screening data row.
In step 30706, the recommendation associated data corresponding to the target field of the target data row is determined according to the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row.
In this embodiment, more accurate data to be recommended can be obtained by screening the first data line to be selected and the second data line to be selected.
In some embodiments, as shown in fig. 3H, the determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first filtering data row, the at least one second filtering data row, the at least one third candidate data row, and the at least one fourth candidate data row may further include the following steps:
At step 30708, at least one repeated data line (i.e., repeated identical data lines) of the at least one first screening data line, the at least one second screening data line, the at least one third candidate data line, and the at least one fourth candidate data line is determined.
At step 30710, one of the at least one repeated data line is retained and the remaining repeated data lines are deleted.
In this step, one of the identical data lines repeatedly appearing for each group is retained, and then the remaining data lines are deleted, in other words, only one of the identical data lines repeatedly appearing for each group is retained as a representative.
It will be appreciated that multiple sets of repeated identical data lines may occur, with each set of repeated identical data lines being treated as described above, i.e. with only one remaining representative.
Optionally, after the deduplication process, information such as the remaining multipath identifier of the duplicate data line (i.e., the identifier of the other duplicate data line that is deleted), the occurrence frequency of the set of duplicate data lines, and the latest update timestamp of the set of duplicate data lines may also be recorded, so that the information may be used for further processing.
In step 30712, the recommendation associated data corresponding to the target field of the target data row is determined according to the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row that delete the remaining repeated data rows.
By the embodiment, the data to be recommended can be subjected to duplicate removal processing, so that the same data row cannot appear in the final recommendation result, and the user experience is improved.
In some embodiments, as shown in fig. 3I, the determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first filtering data row, the at least one second filtering data row, the at least one third candidate data row, and the at least one fourth candidate data row may further include the following steps:
In step 30714, sorting the at least one first screening data line according to the occurrence number of the same data line in the at least one first screening data line and/or the editing time of each first screening data line, so as to obtain a first internal sorting corresponding to the at least one first screening data line.
Since the first screening data line is obtained by matching the first data line 2022 in the first data table 202 based on the feature information and then finding the second data line 2122 associated with the first data line 2022, it is understood that the second data line 2122 associated with the first data line 2022 may be identical based on the feature information matching in the first data table 202, and therefore, in this step, the ranking may be performed according to the occurrence number of the identical data line in the at least one first screening data line.
For example, still taking the merchandise table as an example, one of the at least one first screening data row is a second data row corresponding to the merchandise a, and the second data row actually appears 5 times before being subjected to the duplication elimination processing, and another of the at least one first screening data row is a second data row corresponding to the merchandise B, and the second data row actually appears 3 times before being subjected to the duplication elimination processing, so that the second data row corresponding to the merchandise a may be preferentially recommended before the second data row corresponding to the merchandise B.
In some embodiments, ordering may also be performed in conjunction with the latest edit time. For example, for a plurality of first filter data lines having the same number of occurrences, the sorting may be performed in time-far or near based on the edited time, i.e., the latest edited data line is ranked in front.
In this way, the at least one first screening data row realizes the internal sorting according to the first internal sorting, so that the data row which meets the requirement of the user can be preferentially recommended to the user.
In step 30716, the at least one second screening data row is ranked according to the matching degree of the at least one second screening data row corresponding to the plurality of keywords and/or the similarity of the at least one second screening data row and the feature vector, so as to obtain a second internal ranking corresponding to the at least one second screening data row.
Since the second screening data line is obtained by matching the feature information (keywords and/or feature vectors) in the second data table 212 to obtain the second data line 2122, the ranking may be based on the matching degree of the second screening data line with the plurality of keywords and/or the similarity with the feature vectors.
For example, the ranking may be performed first based on the keyword matching degree, then based on the feature vector similarity, and then based on the keyword matching degree being the same. Or conversely, sorting is firstly performed based on the feature vector similarity, then sorting is performed based on the keyword matching degree for the feature vector similarity to be the same.
In this way, the at least one second screening data row realizes the internal sorting according to the second internal sorting, so that the data row which meets the requirement of the user can be preferentially recommended to the user.
In step 30718, sorting the at least one third data line to be selected according to the occurrence number of the same data line in the at least one third data line to be selected and/or the editing time of each third data line to be selected, so as to obtain a third internal sorting corresponding to the at least one third data line to be selected.
Since the third candidate data line is obtained by selecting at least one edited data line from the first data table 202 and then determining the associated second data line in the target field of the edited data line, it will be understood that the second data lines 2122 associated with the edited data lines may be identical, and thus, in this step, the sorting may be performed according to the occurrence number of the identical data line in the at least one third candidate data line.
For example, still taking the commodity table as an example, one of the at least one third candidate data row is a second data row corresponding to the commodity a, where the second data row actually appears 5 times before being subjected to the deduplication process, and one of the at least one third candidate data row is a second data row corresponding to the commodity B, where the second data row actually appears 3 times before being subjected to the deduplication process, so that the second data row corresponding to the commodity a may be preferentially recommended before the second data row corresponding to the commodity B.
In some embodiments, ordering may also be performed in conjunction with the latest edit time. For example, for a plurality of third candidate data lines having the same number of occurrences, the sorting may be performed in terms of time distance based on the time being edited, i.e., the latest edited data line is ranked in front.
In this way, the at least one third candidate data line realizes the internal sorting according to the third internal sorting, so that the data line more meeting the user requirement can be preferentially recommended to the user.
In step 30720, sorting the at least one fourth to-be-selected data row according to the occurrence probability of each fourth to-be-selected data row in the at least one fourth to-be-selected data row, so as to obtain a fourth internal sorting corresponding to the at least one fourth to-be-selected data row.
Since the fourth candidate data row is obtained based on the recall policy that there is a co-occurrence record with the target data row 2022A in the first data table 202, the fourth candidate data row may be sorted by calculating a conditional probability with respect to the target data row 2022A and then based on the conditional probability.
For example, taking the commodity table as an example, if the second data row corresponding to the commodity a is already associated in the target data row 2022A, and one of the at least one fourth candidate data row is the second data row corresponding to the commodity B, the conditional probability of also associating the commodity B under the condition of associating the commodity a may be calculated, so as to determine the conditional probability of the second data row corresponding to the commodity B. According to this method, the conditional probability of each fourth candidate data line can be calculated and then ordered based thereon.
In this way, the at least one fourth candidate data line realizes the internal sorting according to the fourth internal sorting, so that the data line more meeting the user requirement can be preferentially recommended to the user.
At step 30722, an external ordering between the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row is determined.
In this step, the at least one first screening data line, the at least one second screening data line, the at least one third candidate data line, and the at least one fourth candidate data line may also be prioritized (i.e., externally ordered). That is, the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row are taken as four sets, and priorities are set between the sets, so that after the four sets are combined, it can be determined which set of data rows can be recommended preferentially.
In some embodiments, the external ordering is in the order of the fourth internal ordering, the second internal ordering, the first internal ordering, the third internal ordering, i.e., preferentially recommending the at least one fourth line of data to be selected (and sequentially recommending according to its fourth internal ordering), followed by the at least one second line of screening data, followed by the at least one first line of screening data, and finally the at least one third line of data to be selected.
In this embodiment, since the fourth candidate data line is a recommendation based on the co-occurrence association record, the user requirement can be met. And secondly, the second screening data row obtained by directly matching the characteristic information in the second data table 212 can also better meet the requirements of users. Then, the corresponding second data row found after the first data row is acquired in the first data table 202 based on the similar row can reflect the user requirement to a certain extent. Finally, the corresponding second data row found after the first data row is acquired in the first data table 202 based on the latest editing has low possibility of meeting the user requirement, but still plays a certain role in recommendation.
In step 30724, sorting the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row based on the external sorting, the first internal sorting, the second internal sorting, the third internal sorting, and the fourth internal sorting, to obtain the recommendation related data.
In this embodiment, the recommendation related data adopts a special ordering rule to complete the ordering, so that data rows more likely to meet the user requirements can be ordered in front for priority recommendation, and further the recommendation effect can be improved.
It will be appreciated that the above embodiments are described in terms of the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row being obtained, and in fact, each recall policy described above may have certain starting conditions, so that recalled data may be more satisfactory or have a higher recall rate.
Thus, in some embodiments, according to the characteristic information, obtaining at least one first data row in the first data table that matches the characteristic information and/or obtaining at least one second data row in a second data table associated with the first data table that matches the characteristic information may further comprise the steps of:
In response to the target data line 2022A being not empty (i.e., the target data line 2022A may extract the characteristic information) and the first data table 202 including the first data line 2022 for which the association operation has been performed (i.e., the first data line 2022 for which an association record already exists in the first data table 202), the at least one first data line matching the characteristic information may be acquired in the first data table according to the characteristic information.
Thus, the recall strategy is started when the starting condition is met, and the recall rate can be improved.
In some embodiments, according to the feature information, acquiring at least one first data row matching the feature information in the first data table and/or acquiring at least one second data row matching the feature information in a second data table associated with the first data table may further comprise the steps of:
In response to the target data row 2022A being not empty (i.e., target data row 2022A may extract characteristic information) and the number of second data rows 2122 of the second data table 212 being greater than a fourth number (e.g., 5000, 10000, etc.), the at least one second data row 2022 matching the characteristic information is acquired in the second data table 212 associated with the first data table 202 according to the characteristic information.
Thus, when the number of second data rows 2122 of the second data table 212 is larger, the recall policy can be adopted to obtain data rows more meeting the user requirement, and the recall rate can be improved.
In some embodiments, the determining at least one first edited data row in the first data table that is generated within a first preset time period from the current time may further include:
The at least one first edited data row is acquired in response to the first data table including the at least one first edited data row generated within the first preset time period from a current time and the target field of the first edited data row having performed an association operation (i.e., having a second data row associated with its target field).
Thus, the recall strategy is started when the starting condition is met, and the recall rate can be improved.
In some embodiments, the determining, in response to a third data row having been associated in the target field of the target data row, at least one other data row in the first data table that is associated with the third data row in the target field may further include:
In response to determining that the target field allows association of a plurality of the second data rows (i.e., the field allows addition of a plurality of stripe records), that the target field has associated a plurality of the first data rows of the first data table is higher than a predetermined proportion (e.g., a number of first data rows with a target field associated with a plurality of second data rows is 20%, 30%, 50%, etc. of a total number of data rows of the first data table) and that the number of first data rows of the first data table is greater than a fifth number (e.g., a total number of data rows of the first data table is 20, 50, etc.), at least one other data row of the first data table that is associated with a third data row in the target field of the target data row is determined.
Therefore, the recall strategy is started when the starting condition is met, so that the co-occurrence record can better meet the requirements of users.
It will be appreciated that it is possible with a small probability that all of the recall policies described above are triggered but still unable to recall data, and therefore, in some embodiments, a spam recall policy is also provided. Specifically, the method may further include the following steps:
in response to failing to acquire the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row, at least one second edited data row in the second data table 212 within a second preset time period (e.g., 1 day, 2 days, 3 days, 5 days, etc.) from a current time or a sixth number (e.g., the last 20) of third edited data rows closest to the current time is determined as the recommendation-related data of the target data row.
In this way, by using the most recently edited second data row 2122 in the second data table 212 as spam recommendation data, normal execution of the recommendation function can be ensured.
Returning to FIG. 3A, after the recommended association data is obtained, at step 308, an association data recommendation for the target data row 2022A may be generated from the recommended association data.
In this step, a part may be selected from the recommended association data as an association data recommendation result of the target data line 2022A. Because the recommended association data can be obtained based on the fused recall strategy, the specific screening strategy and/or the sorting strategy, the user requirements can be met based on the generated association data recommendation result.
In some embodiments, the generating the associated data recommendation for the target data row based on the recommended associated data may further include selecting a third number (e.g., 3, 5, etc.) of data rows from the recommended associated data as the associated data recommendation for the target data row based on the external rank, the first internal rank, the second internal rank, the third internal rank, and the fourth internal rank.
In this embodiment, according to the external sorting, the first internal sorting, the second internal sorting, the third internal sorting and the fourth internal sorting, a third number of data rows are selected from the recommendation related data to recommend the data rows, and the data rows with the front sorting can be preferentially recommended to the user, so that the user requirements can be more satisfied.
Fig. 4A shows a schematic diagram of an exemplary page 200, according to an embodiment of the present disclosure.
As shown in FIG. 4A, in some embodiments, when the user 104A triggers the recommendation control 216, the associated data recommendation comprising a plurality of recommendation data lines 220 may be displayed in the first page 210.
As shown in fig. 4A, optionally, a change control 222 may be included in the first page 210, and after the user 104A triggers the change control 222, the terminal device 102A may sequentially select a third number of multiple data rows from the recommended association data to replace the currently displayed multiple recommended data rows 220, so as to complete the change of the data rows, so that the user 104A may more easily find the data row that the user wants to associate.
As shown in FIG. 4A, in some embodiments, the forefront of each data line includes a tick box 2202, and the user 104A can associate the data line corresponding to the tick box 2202 to the target data line 2202A by tick-ing the corresponding tick box 2202 and clicking the ok control 250, thereby completing the data association. Optionally, as shown in fig. 4A, when the user 104A performs a check in the check box 2202 of the recommended data row 220, a check box of a corresponding second data row 2122 in the second data table 212 is also selected, so as to prompt the user 104A that the check operation is directed to a certain second data row 2122 in the second data table 212, and consistency of the recommended data row and the second data row is maintained.
Fig. 4B shows another schematic diagram of an exemplary page 200 according to an embodiment of the disclosure.
In some embodiments, after the user 104A selects a recommended data row from the associated data recommendations to perform the association operation (e.g., the user 104A clicks the determine control 250 by hooking the hooking box 2202 corresponding to a particular second data row) as shown in FIG. 4B, the second data row is associated into the target data row 2022A of the first data table 202 and information of the second data row is displayed in the cell 2024A of the target data row 2022A corresponding to the target field 2024. Still taking the second data table 212 as the commodity table, as shown in fig. 4A, when the user 104A selects the first recommended data row in the associated data recommendation result, the commodity corresponding to the recommended data row is a necklace, at this time, as shown in fig. 4B, the name (for example, necklace) of the commodity may be displayed in the cell 2024A corresponding to the target field 2024 of the target data row 2022A, so that the user 104A may know that the target data row 2022A has been associated with the necklace, which is the commodity in the commodity table. It will be appreciated that the information displayed in cell 2024A may also be other information that may characterize the content or characteristics of the associated second data line. Taking the second data table 212 as an example of a commodity table, the information displayed in the cell 2024A may also be other information that may characterize a commodity, such as a commodity Identification (ID).
In some embodiments, when the above-described association operation is a one-way association, the user 104A may also view the information of the second data row associated with the user by triggering the cell 2024A of the target data row 2022A, so that the user may conveniently view detailed information of the data rows in other data tables associated with the target data row. For example, by hovering a mouse over the cell 2024A, the entire row of content or a portion of the content of the second data row associated with the cell 2024A may be displayed at an associated location (e.g., above or below) of the cell 2024A (when out of page display range, a portion of the content may be displayed and slid by a scroll bar). For another example, when the cell 2024A is clicked or double-clicked, the page of the second data table 212 may be jumped to expose the entire row of content or portions of content of the second data row with which the cell 2024A is associated and is high reveal therein.
In some embodiments, when the above-mentioned association operation is bidirectional association, in addition to the functions that can be achieved by implementing the above-mentioned unidirectional association, the information of the target data line 2022A may be checked in the second data line corresponding to the target data line 2022A of the second data table 212, so that the user may conveniently check the detailed information of the target data line 2022A in other data tables that implement bidirectional association with the target data line 2022A. Taking the first data table 202 as an order table and the second data table 212 as a commodity table as an example, referring to fig. 2B, the second data line 2122 of the second data table 212 may include an order number field, and when the target data line 2022A is associated with the second data line in a bidirectional manner, the order number field of the second data line 2122 is correspondingly associated with the target data line 2022A of the order table (the first data table 202), so that the user can view the information of the associated target data line 2022A by triggering the cell corresponding to the order number field of the second data line. Similarly, the entire row or portion of content of the target data row 2022A may be viewed by a hover operation, and the page of the first data table 202 may be jumped to by a single or double click to expose the first data table 202 and highlight the entire row or portion of content of the target data row 2022A with which the cell is associated therein.
From the above embodiments, it can be seen that the embodiments of the present disclosure provide a table data processing method, which can improve coverage rate and accuracy of data recommendation to a certain extent by fusing at least two recall strategies. In some embodiments, four complementary recall strategies are designed, and potential association records (rows of data) can be mined from different dimensions. In some embodiments, union (Union) strategies are employed to merge the recall results and through deduplication, a more comprehensive candidate set is obtained. In some embodiments, a multi-level ranking strategy is adopted, and the candidate sets are ranked according to the priority of the recall strategy and the ranking rule in the strategy, so that the accuracy of the recommendation result is improved. In some embodiments, starting conditions are set for each recall strategy, so that corresponding strategies are ensured to be started under a proper scene, and recommendation efficiency and accuracy are improved. In some embodiments, a spam policy is employed to ensure availability of the recommender system in the event that all recall policies are not triggered or there are no data recalls after triggering.
According to some embodiments of the present disclosure, multiple complementary recall strategies may be designed and applied for different data features and application scenarios, and each recall result is fused through a merge strategy, so that an optimal associated record candidate set is presented to a user according to a ranking strategy.
According to some embodiments of the present disclosure, by integrating multiple recall strategies, potential associated records can be mined from different dimensions, so that recall rates of the associated records are effectively improved, and valuable information is prevented from being omitted. According to some embodiments of the present disclosure, multiple recall strategies are used to complement each other, and LLM screening and multistage sorting strategies are combined, so that noise data can be effectively filtered, and accuracy and relevance of a recommendation result can be improved. Some embodiments of the present disclosure can adaptively select and combine appropriate recall policies for different data features and application scenarios, enhancing robustness of the system.
It will be appreciated that the data recommendation algorithm described above may be applied to more than just the scenarios described in the embodiments described above. In the age of information explosion, when users face mass data, it is important to quickly and accurately find the required information. Therefore, the data recommendation algorithm can be widely applied to various application scenes, such as knowledge base construction, data analysis, intelligent assistants and the like.
It should be noted that the method of the embodiments of the present disclosure may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present disclosure, the devices interacting with each other to accomplish the methods.
It should be noted that the foregoing describes some embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The disclosed embodiments also provide a computer device for implementing the above-described method 300. Fig. 5 shows a hardware architecture diagram of an exemplary computer device 500 provided by an embodiment of the present disclosure. The computer device 500 may be used to implement the server 106 of fig. 1, as well as the terminal devices 102A, 102B of fig. 1. In some scenarios, the computer device 500 may also be used to implement the database server 108 of FIG. 1.
As shown in FIG. 5, computer device 500 may include a processor 502, a memory 504, a network interface 506, a peripheral interface 508, and a bus 510. Wherein the processor 502, the memory 504, the network interface 506 and the peripheral interface 508 enable a communication connection therebetween within the computer device 500 via a bus 510.
The processor 502 may be a central processing unit (Central Processing Unit, CPU), an image processor, a neural Network Processor (NPU), a Microcontroller (MCU), a programmable logic device, a Digital Signal Processor (DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits. The processor 502 may be used to perform functions related to the techniques described in this disclosure. In some embodiments, processor 502 may also include multiple processors integrated as a single logical component. For example, as shown in fig. 5, the processor 502 may include a plurality of processors 502a, 502b, and 502c.
Memory 504 may be configured to store data (e.g., instructions, computer code, etc.). As shown in fig. 5, the data stored by the memory 504 may include program instructions (e.g., program instructions for implementing the method 300 of embodiments of the present disclosure) as well as data to be processed (e.g., the memory may store configuration files of other modules, etc.). The processor 502 may also access program instructions and data stored in the memory 504 and execute the program instructions to perform operations on the data to be processed. Memory 504 may include volatile storage or nonvolatile storage. In some embodiments, memory 504 may include Random Access Memory (RAM), read Only Memory (ROM), optical disks, magnetic disks, hard disks, solid State Disks (SSD), flash memory, memory sticks, and the like.
The network interface 506 may be configured to provide the computer device 500 with communications with other external devices via a network. The network may be any wired or wireless network capable of transmitting and receiving data. For example, the network may be a wired network, a local wireless network (e.g., bluetooth, wiFi, near Field Communication (NFC), etc.), a cellular network, the internet, or a combination of the foregoing. It will be appreciated that the type of network is not limited to the specific examples described above.
Peripheral interface 508 may be configured to connect computer apparatus 500 with one or more peripheral devices to enable information input and output. For example, the peripheral devices may include input devices such as keyboards, mice, touchpads, touch screens, microphones, various types of sensors, and output devices such as displays, speakers, vibrators, and indicators.
Bus 510 may be configured to transfer information between the various components of computer device 500 (e.g., processor 502, memory 504, network interface 506, and peripheral interface 508), such as an internal bus (e.g., processor-memory bus), an external bus (USB port, PCI-E bus), etc.
It should be noted that, although the architecture of the computer device 500 described above illustrates only the processor 502, the memory 504, the network interface 506, the peripheral interface 508, and the bus 510, in a specific implementation, the architecture of the computer device 500 may also include other components necessary to achieve proper operation. Moreover, those skilled in the art will appreciate that the architecture of the computer device 500 described above may include only the components necessary to implement the disclosed embodiments, and not all of the components shown in the figures.
The embodiment of the disclosure also provides a table data processing device. Fig. 6 shows a schematic diagram of an exemplary apparatus 600 provided by an embodiment of the present disclosure. As shown in fig. 6, the apparatus 600 may be used to implement the method 300 and may further include the following modules.
A first determining module 602 configured to determine characteristic information of a target data row in a first data table in response to an associated data recommendation request for the target data row;
an obtaining module 604 configured to obtain, according to the feature information, at least one first data line matching the feature information in the first data table and/or at least one second data line matching the feature information in a second data table associated with the first data table;
a second determining module 606 configured to determine recommended association data of the target data line based on the at least one first data line and/or the at least one second data line matching the characteristic information;
A generation module 608 is configured to generate an associated data recommendation result of the target data row according to the recommended associated data.
In some embodiments, the first determining module 602 is configured to extract a plurality of keywords from the target data line, and determine the feature information according to the plurality of keywords.
In some embodiments, the first determining module 602 is configured to generate a feature vector corresponding to the target data line according to the plurality of keywords, and determine the feature information according to the plurality of keywords and the feature vector.
In some embodiments, the acquisition module 604 is configured to:
Acquiring at least one first candidate data row matched with the keywords in the first data table according to the keywords;
according to the feature vector, at least one second candidate data row matched with the feature vector is acquired in the first data table;
And determining the at least one first data line matched with the characteristic information according to the at least one first candidate data line and the at least one second candidate data line.
In some embodiments, the acquisition module 604 is configured to:
acquiring at least one third candidate data row matched with the keywords in the second data table according to the keywords;
According to the feature vector, at least one fourth candidate data row matched with the feature vector is obtained in the second data table;
and determining the at least one second data line matched with the characteristic information according to the at least one third candidate data line and the at least one fourth candidate data line.
In some embodiments, the associated data recommendation request includes a request to recommend associated data for a target field of the target data row;
a second determination module 606 configured to:
Determining at least one data row in the second data table associated with the target field of the at least one first data row matching the characteristic information as at least one first data row to be selected;
Determining the at least one second data line matched with the characteristic information as at least one second data line to be selected;
and determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first data row to be selected and the at least one second data row to be selected.
In some embodiments, the obtaining module 604 is configured to determine at least one first edited data row in the first data table that is generated within a first preset time period from a current time; determining at least one data row in the second data table associated with the target field of the at least one first edited data row as at least one third candidate data row;
the second determining module 606 is configured to determine the recommendation associated data corresponding to the target field of the target data row according to the at least one first data row to be selected, the at least one second data row to be selected, and the at least one third data row to be selected.
In some embodiments, the acquisition module 604 is configured to:
Determining, in response to a third data row having been associated in the target field of the target data row, at least one other data row in the first data table that is associated with the third data row in the target field;
Responsive to the number of the at least one other data line being greater than the first number, determining at least one fourth data line to which the at least one other data line is also associated in the target field;
Determining, for each of the fourth data rows, whether the number of the at least one other data row with which the fourth data row is associated is greater than a second number;
determining at least one of the fourth data lines, for which the number of the associated at least one other data line is greater than the second number, as at least one fourth candidate data line;
The second determining module 606 is configured to determine the recommendation associated data corresponding to the target field of the target data row according to the at least one first data row to be selected, the at least one second data row to be selected, the at least one third data row to be selected, and the at least one fourth data row to be selected.
In some embodiments, the second determining module 606 is configured to:
inputting the at least one first data line to be selected into a first screening model to obtain at least one first screening data line;
inputting the at least one second data line to be selected into a second screening model to obtain at least one second screening data line;
Determining the recommended association data corresponding to the target field of the target data row according to the at least one first screening data row, the at least one second screening data row, the at least one third to-be-selected data row and the at least one fourth to-be-selected data row;
The first screening model and the second screening model are large language models, the first screening model screens the at least one first data line to be selected based on a first prompting word, the second screening model screens the at least one second data line to be selected based on a second prompting word, the first prompting word comprises task information for screening the at least one first data line to be selected based on an association relation between the first data line to be selected and the first data table, and the second prompting word comprises task information for screening the at least one second data line to be selected based on an association relation between the first data table and the second data table.
In some embodiments, the second determining module 606 is configured to:
determining at least one repeated data line of the at least one first screening data line, the at least one second screening data line, the at least one third data line to be selected and the at least one fourth data line to be selected;
retaining one of the at least one duplicate data line and deleting the remaining duplicate data line;
And determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first screening data row, the at least one second screening data row, the at least one third data row to be selected and the at least one fourth data row to be selected of the rest repeated data rows.
In some embodiments, the second determining module 606 is configured to:
Sorting the at least one first screening data row according to the occurrence times of the same data row in the at least one first screening data row and/or the editing time of each first screening data row, so as to obtain a first internal sorting corresponding to the at least one first screening data row;
sorting the at least one second screening data row according to the matching degree of the at least one second screening data row corresponding to the plurality of keywords and/or the similarity of the at least one second screening data row and the feature vector, so as to obtain a second internal sorting corresponding to the at least one second screening data row;
Sorting the at least one third data line to be selected according to the occurrence times of the same data line in the at least one third data line to be selected and/or the editing time of each third data line to be selected, so as to obtain a third internal sorting corresponding to the at least one third data line to be selected;
Sorting the at least one fourth data line to be selected according to the occurrence probability of each fourth data line to be selected in the at least one fourth data line to be selected, so as to obtain a fourth internal sorting corresponding to the at least one fourth data line to be selected;
determining an external ordering between the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row;
And sorting the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row and the at least one fourth candidate data row based on the external sorting, the first internal sorting, the second internal sorting, the third internal sorting and the fourth internal sorting to obtain the recommendation related data.
In some embodiments, the external ordering is in the order of the fourth internal ordering, the second internal ordering, the first internal ordering, the third internal ordering;
the generating module 608 is configured to select a third number of data rows from the recommended associated data as the associated data recommendation result of the target data row based on the external ordering, the first internal ordering, the second internal ordering, the third internal ordering, and the fourth internal ordering.
In some embodiments, the obtaining module 604 is configured to obtain the at least one first data row matching the characteristic information in the first data table according to the characteristic information in response to the target data row not being empty and the first data table including the first data row for which an association operation has been performed.
In some embodiments, the obtaining module 604 is configured to obtain the at least one second data row matching the characteristic information in the second data table associated with the first data table according to the characteristic information in response to the target data row not being empty and the number of the second data rows of the second data table being greater than a fourth number.
In some embodiments, the obtaining module 604 is configured to obtain the at least one first edited data row in response to the first data table including the at least one first edited data row produced within the first preset time period from a current time and the target field of the first edited data row having performed an association operation.
In some embodiments, the obtaining module 604 is configured to determine, in response to determining that the target field allows association of a plurality of the second data rows, that the target field in the first data table has associated a plurality of the first data rows, that a ratio of the first data rows in the first data table is higher than a preset ratio and that a number of the first data rows in the first data table is greater than a fifth number, that a third data row has been associated in the target field of the target data row, at least one other data row in the first data table that has been associated in the target field of the third data row.
In some embodiments, the obtaining module 604 is configured to determine at least one second edited data row in the second data table within a second preset time period from the current time or a sixth number of third edited data rows closest to the current time as the recommended association data of the target data row in response to failure to obtain the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of the various modules may be implemented in the same one or more pieces of software and/or hardware when implementing the present disclosure.
The apparatus of the foregoing embodiments is configured to implement the corresponding method 300 in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein.
Based on the same inventive concept, the present disclosure also provides a non-transitory computer-readable storage medium containing a computer program, corresponding to any of the above-described embodiment methods, which when executed by one or more processors, causes the one or more processors to perform the method 300.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the above embodiment stores a computer program for causing the one or more processors to perform the method 300 according to any one of the above embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
Based on the same inventive concept, the present disclosure also provides a computer program product, corresponding to any of the embodiment methods 300 described above, comprising a computer program. In some embodiments, the computer program is executable by one or more processors to cause the processors to perform the described method 300. Corresponding to the execution bodies corresponding to the steps in the embodiments of the method 300, the processor executing the corresponding step may belong to the corresponding execution body.
The computer program product of the above embodiment is configured to cause a processor to perform the method 300 of any of the above embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
It will be appreciated by persons skilled in the art that the foregoing discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure, including the claims, is limited to these examples, that the steps may be implemented in any order and that many other variations of the different aspects of the disclosed embodiments described above are present, which are not provided in detail for the sake of brevity, and that the features of the above embodiments or of the different embodiments may also be combined within the spirit of the disclosure.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present disclosure. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present disclosure, and this also accounts for the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present disclosure are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the disclosure, are intended to be included within the scope of the disclosure.

Claims (20)

1.一种表格数据处理方法,包括:1. A table data processing method, comprising: 响应于针对第一数据表中目标数据行的关联数据推荐请求,确定所述目标数据行的特征信息;In response to a request for recommending associated data for a target data row in the first data table, determining characteristic information of the target data row; 根据所述特征信息,在所述第一数据表中获取与所述特征信息匹配的至少一个第一数据行和/或在与所述第一数据表关联的第二数据表中获取与所述特征信息匹配的至少一个第二数据行;According to the characteristic information, acquiring at least one first data row matching the characteristic information in the first data table and/or acquiring at least one second data row matching the characteristic information in a second data table associated with the first data table; 根据与所述特征信息匹配的所述至少一个第一数据行和/或所述至少一个第二数据行,确定所述目标数据行的推荐关联数据;Determining recommended associated data of the target data row according to the at least one first data row and/or the at least one second data row matching the characteristic information; 根据所述推荐关联数据,生成所述目标数据行的关联数据推荐结果。Generate a recommendation result of the associated data of the target data row according to the recommended associated data. 2.如权利要求1所述的方法,其中,所述确定所述目标数据行的特征信息,进一步包括:2. The method according to claim 1, wherein the step of determining the characteristic information of the target data row further comprises: 从所述目标数据行中提取多个关键词,根据所述多个关键词确定所述特征信息。A plurality of keywords are extracted from the target data row, and the feature information is determined according to the plurality of keywords. 3.如权利要求2所述的方法,其中,所述根据所述多个关键词确定所述特征信息,进一步包括:3. The method according to claim 2, wherein determining the feature information according to the plurality of keywords further comprises: 根据所述多个关键词,生成所述目标数据行对应的特征向量;Generating a feature vector corresponding to the target data row according to the multiple keywords; 根据所述多个关键词和所述特征向量,确定所述特征信息。The feature information is determined according to the multiple keywords and the feature vector. 4.如权利要求3所述的方法,其中,所述根据所述特征信息,在所述第一数据表中获取与所述特征信息匹配的至少一个第一数据行和/或在与所述第一数据表关联的第二数据表中获取与所述特征信息匹配的至少一个第二数据行,进一步包括:4. The method of claim 3, wherein the acquiring, according to the characteristic information, at least one first data row matching the characteristic information in the first data table and/or acquiring at least one second data row matching the characteristic information in a second data table associated with the first data table further comprises: 根据所述多个关键词,在所述第一数据表中获取与所述多个关键词匹配的至少一个第一候选数据行;According to the multiple keywords, acquiring at least one first candidate data row matching the multiple keywords in the first data table; 根据所述特征向量,在所述第一数据表中获取与所述特征向量匹配的至少一个第二候选数据行;According to the feature vector, acquiring at least one second candidate data row matching the feature vector in the first data table; 根据所述至少一个第一候选数据行和所述至少一个第二候选数据行,确定与所述特征信息匹配的所述至少一个第一数据行。The at least one first data row matching the feature information is determined according to the at least one first candidate data row and the at least one second candidate data row. 5.如权利要求3所述的方法,其中,所述根据所述特征信息,在所述第一数据表中获取与所述特征信息匹配的至少一个第一数据行和/或在与所述第一数据表关联的第二数据表中获取与所述特征信息匹配的至少一个第二数据行,进一步包括:5. The method of claim 3, wherein the acquiring, according to the characteristic information, at least one first data row matching the characteristic information in the first data table and/or acquiring at least one second data row matching the characteristic information in a second data table associated with the first data table further comprises: 根据所述多个关键词,在所述第二数据表中获取与所述多个关键词匹配的至少一个第三候选数据行;According to the multiple keywords, acquiring at least one third candidate data row matching the multiple keywords in the second data table; 根据所述特征向量,在所述第二数据表中获取与所述特征向量匹配的至少一个第四候选数据行;According to the feature vector, obtaining at least one fourth candidate data row matching the feature vector in the second data table; 根据所述至少一个第三候选数据行和所述至少一个第四候选数据行,确定与所述特征信息匹配的所述至少一个第二数据行。The at least one second data row matching the feature information is determined according to the at least one third candidate data row and the at least one fourth candidate data row. 6.如权利要求3所述的方法,其中,所述关联数据推荐请求包括用于为所述目标数据行的目标字段推荐关联数据的请求;6. The method of claim 3, wherein the associated data recommendation request comprises a request for recommending associated data for a target field of the target data row; 所述根据与所述特征信息匹配的所述至少一个第一数据行和/或所述至少一个第二数据行,确定所述目标数据行的推荐关联数据,进一步包括:The determining, according to the at least one first data row and/or the at least one second data row matching the characteristic information, the recommended associated data of the target data row further comprises: 将与所述特征信息匹配的所述至少一个第一数据行的所述目标字段所关联的所述第二数据表中的至少一个数据行确定为至少一个第一待选数据行;Determine at least one data row in the second data table associated with the target field of the at least one first data row matching the characteristic information as at least one first data row to be selected; 将与所述特征信息匹配的所述至少一个第二数据行确定为至少一个第二待选数据行;Determine the at least one second data row matching the characteristic information as at least one second data row to be selected; 根据所述至少一个第一待选数据行和所述至少一个第二待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据。The recommended associated data corresponding to the target field of the target data row is determined according to the at least one first data row to be selected and the at least one second data row to be selected. 7.如权利要求6所述的方法,还包括:7. The method of claim 6, further comprising: 确定所述第一数据表中距离当前时间的第一预设时间段内产生的至少一个第一已编辑数据行;Determine at least one first edited data row generated within a first preset time period from the current time in the first data table; 将所述至少一个第一已编辑数据行的所述目标字段所关联的所述第二数据表中的至少一个数据行确定为至少一个第三待选数据行;Determine at least one data row in the second data table associated with the target field of the at least one first edited data row as at least one third data row to be selected; 所述根据所述至少一个第一待选数据行和所述至少一个第二待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据,进一步包括:根据所述至少一个第一待选数据行、所述至少一个第二待选数据行和所述至少一个第三待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据。Determining the recommended associated data corresponding to the target field of the target data row based on the at least one first data row to be selected and the at least one second data row to be selected further includes: determining the recommended associated data corresponding to the target field of the target data row based on the at least one first data row to be selected, the at least one second data row to be selected and the at least one third data row to be selected. 8.如权利要求7所述的方法,还包括:8. The method of claim 7, further comprising: 响应于所述目标数据行的所述目标字段中已关联第三数据行,确定所述第一数据表中在所述目标字段中关联所述第三数据行的至少一个其他数据行;In response to the target field of the target data row being associated with a third data row, determining at least one other data row in the first data table that is associated with the third data row in the target field; 响应于所述至少一个其他数据行的数量大于第一数量,确定所述至少一个其他数据行还在所述目标字段中关联的至少一个第四数据行;In response to the number of the at least one other data row being greater than the first number, determining that the at least one other data row is also associated with at least one fourth data row in the target field; 针对每个所述第四数据行,确定所述第四数据行所关联的所述至少一个其他数据行的数量是否大于第二数量;For each of the fourth data rows, determining whether the number of the at least one other data row associated with the fourth data row is greater than a second number; 将关联的所述至少一个其他数据行的数量大于第二数量的至少一个所述第四数据行确定为至少一个第四待选数据行;Determine at least one of the fourth data rows, the number of which is greater than the second number, as at least one fourth data row to be selected; 所述根据所述至少一个第一待选数据行、所述至少一个第二待选数据行和所述至少一个第三待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据,进一步包括:根据所述至少一个第一待选数据行、所述至少一个第二待选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据。The determining the recommended associated data corresponding to the target field of the target data row based on the at least one first data row to be selected, the at least one second data row to be selected, and the at least one third data row to be selected, further includes: determining the recommended associated data corresponding to the target field of the target data row based on the at least one first data row to be selected, the at least one second data row to be selected, the at least one third data row to be selected, and the at least one fourth data row to be selected. 9.如权利要求8所述的方法,其中,根据所述至少一个第一待选数据行、所述至少一个第二待选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据,进一步包括:9. The method of claim 8, wherein determining the recommended associated data corresponding to the target field of the target data row according to the at least one first data row to be selected, the at least one second data row to be selected, the at least one third data row to be selected, and the at least one fourth data row to be selected further comprises: 将所述至少一个第一待选数据行输入第一筛选模型以得到至少一个第一筛选数据行;Inputting the at least one first to-be-selected data row into a first screening model to obtain at least one first screened data row; 将所述至少一个第二待选数据行输入第二筛选模型以得到至少一个第二筛选数据行;Inputting the at least one second to-be-selected data row into a second screening model to obtain at least one second screened data row; 根据所述至少一个第一筛选数据行、所述至少一个第二筛选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据;Determining the recommended associated data corresponding to the target field of the target data row according to the at least one first screening data row, the at least one second screening data row, the at least one third to-be-selected data row, and the at least one fourth to-be-selected data row; 其中,所述第一筛选模型和所述第二筛选模型均为大语言模型,所述第一筛选模型基于第一提示词对所述至少一个第一待选数据行进行筛选,所述第二筛选模型基于第二提示词对所述至少一个第二待选数据行进行筛选,所述第一提示词包括基于所述第一待选数据行与所述第一数据表之间的关联关系对所述至少一个第一待选数据行进行筛选的任务信息,所述第二提示词包括基于所述第一数据表与所述第二数据表之间的关联关系对所述至少一个第二待选数据行进行筛选的任务信息。Among them, the first screening model and the second screening model are both large language models, the first screening model filters the at least one first candidate data row based on a first prompt word, and the second screening model filters the at least one second candidate data row based on a second prompt word, the first prompt word includes task information for filtering the at least one first candidate data row based on the association relationship between the first candidate data row and the first data table, and the second prompt word includes task information for filtering the at least one second candidate data row based on the association relationship between the first data table and the second data table. 10.如权利要求9所述的方法,其中,所述根据所述至少一个第一筛选数据行、所述至少一个第二筛选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据,进一步包括:10. The method of claim 9, wherein the step of determining the recommended associated data corresponding to the target field of the target data row according to the at least one first screening data row, the at least one second screening data row, the at least one third to-be-selected data row, and the at least one fourth to-be-selected data row further comprises: 确定所述至少一个第一筛选数据行、所述至少一个第二筛选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行中的至少一个重复数据行;determining at least one duplicate data row among the at least one first screened data row, the at least one second screened data row, the at least one third to-be-selected data row, and the at least one fourth to-be-selected data row; 保留所述至少一个重复数据行的其中一个并删除剩余的所述重复数据行;retaining one of the at least one duplicate data row and deleting the remaining duplicate data rows; 根据删除剩余的所述重复数据行的所述至少一个第一筛选数据行、所述至少一个第二筛选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据。The recommended associated data corresponding to the target field of the target data row is determined according to the at least one first filtered data row, the at least one second filtered data row, the at least one third selected data row and the at least one fourth selected data row after deleting the remaining duplicate data rows. 11.如权利要求9所述的方法,其中,所述根据所述至少一个第一筛选数据行、所述至少一个第二筛选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行,确定所述目标数据行的所述目标字段对应的所述推荐关联数据,进一步包括:11. The method of claim 9, wherein the step of determining the recommended associated data corresponding to the target field of the target data row according to the at least one first screening data row, the at least one second screening data row, the at least one third to-be-selected data row, and the at least one fourth to-be-selected data row further comprises: 根据所述至少一个第一筛选数据行中相同数据行的出现次数和/或每个所述第一筛选数据行的编辑时间,对所述至少一个第一筛选数据行进行排序,得到所述至少一个第一筛选数据行对应的第一内部排序;Sort the at least one first screening data row according to the number of occurrences of the same data row in the at least one first screening data row and/or the editing time of each of the first screening data rows to obtain a first internal sorting corresponding to the at least one first screening data row; 根据所述至少一个第二筛选数据行对应的与所述多个关键词的匹配度和/或与所述特征向量的相似度,对所述至少一个第二筛选数据行进行排序,得到所述至少一个第二筛选数据行对应的第二内部排序;Sorting the at least one second screening data row according to the matching degree with the multiple keywords and/or the similarity with the feature vector corresponding to the at least one second screening data row to obtain a second internal sorting corresponding to the at least one second screening data row; 根据所述至少一个第三待选数据行中相同数据行的出现次数和/或每个所述第三待选数据行的编辑时间,对所述至少一个第三待选数据行进行排序,得到所述至少一个第三待选数据行对应的第三内部排序;Sort the at least one third data row to be selected according to the number of occurrences of the same data row in the at least one third data row to be selected and/or the edit time of each of the at least one third data row to be selected, to obtain a third internal sorting corresponding to the at least one third data row to be selected; 根据所述至少一个第四待选数据行中每个所述第四待选数据行的出现概率,对所述至少一个第四待选数据行进行排序,得到所述至少一个第四待选数据行对应的第四内部排序;sorting the at least one fourth data row to be selected according to the occurrence probability of each of the at least one fourth data row to be selected, to obtain a fourth internal sorting corresponding to the at least one fourth data row to be selected; 确定所述至少一个第一筛选数据行、所述至少一个第二筛选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行四者之间的外部排序;Determine an external ordering among the at least one first filtered data row, the at least one second filtered data row, the at least one third to-be-selected data row, and the at least one fourth to-be-selected data row; 基于所述外部排序、所述第一内部排序、所述第二内部排序、所述第三内部排序和所述第四内部排序,对所述至少一个第一筛选数据行、所述至少一个第二筛选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行进行排序,得到所述推荐关联数据。Based on the external sorting, the first internal sorting, the second internal sorting, the third internal sorting and the fourth internal sorting, the at least one first filtered data row, the at least one second filtered data row, the at least one third to-be-selected data row and the at least one fourth to-be-selected data row are sorted to obtain the recommended associated data. 12.如权利要求11所述的方法,其中,所述外部排序依次为按照所述第四内部排序、所述第二内部排序、所述第一内部排序、所述第三内部排序的顺序;12. The method according to claim 11, wherein the external sorting is in the order of the fourth internal sorting, the second internal sorting, the first internal sorting, and the third internal sorting; 所述根据所述推荐关联数据,生成所述目标数据行的关联数据推荐结果,进一步包括:基于所述外部排序、所述第一内部排序、所述第二内部排序、所述第三内部排序和所述第四内部排序,从所述推荐关联数据中选取第三数量的数据行来作为所述目标数据行的所述关联数据推荐结果。Generating a recommendation result of associated data of the target data row according to the recommended associated data further includes: selecting a third number of data rows from the recommended associated data as the recommendation result of associated data of the target data row based on the external sorting, the first internal sorting, the second internal sorting, the third internal sorting and the fourth internal sorting. 13.如权利要求1所述的方法,其中,根据所述特征信息,在所述第一数据表中获取与所述特征信息匹配的至少一个第一数据行和/或在与所述第一数据表关联的第二数据表中获取与所述特征信息匹配的至少一个第二数据行,进一步包括:13. The method of claim 1, wherein, according to the characteristic information, acquiring at least one first data row matching the characteristic information in the first data table and/or acquiring at least one second data row matching the characteristic information in a second data table associated with the first data table, further comprises: 响应于所述目标数据行不为空且所述第一数据表包括已执行关联操作的所述第一数据行,根据所述特征信息,在所述第一数据表中获取与所述特征信息匹配的所述至少一个第一数据行;和/或In response to the target data row being not empty and the first data table including the first data row on which the association operation has been performed, acquiring, according to the characteristic information, in the first data table, the at least one first data row matching the characteristic information; and/or 响应于所述目标数据行不为空且所述第二数据表的所述第二数据行的数量大于第四数量,根据所述特征信息,在与所述第一数据表关联的所述第二数据表中获取与所述特征信息匹配的所述至少一个第二数据行。In response to the target data row being not empty and the number of the second data rows in the second data table being greater than a fourth number, obtaining, according to the characteristic information, the at least one second data row matching the characteristic information in the second data table associated with the first data table. 14.如权利要求7所述的方法,其中,所述确定所述第一数据表中距离当前时间的第一预设时间段内产生的至少一个第一已编辑数据行,进一步包括:14. The method of claim 7, wherein the determining of at least one first edited data row generated in the first data table within a first preset time period from the current time further comprises: 响应于所述第一数据表中包括距离当前时间的所述第一预设时间段内产生的所述至少一个第一已编辑数据行且所述第一已编辑数据行的所述目标字段已执行关联操作,获取所述至少一个第一已编辑数据行。In response to the first data table including the at least one first edited data row generated within the first preset time period from the current time and the target field of the first edited data row having performed an association operation, the at least one first edited data row is acquired. 15.如权利要求8所述的方法,其中,所述响应于所述目标数据行的所述目标字段中已关联第三数据行,确定所述第一数据表中在所述目标字段中关联所述第三数据行的至少一个其他数据行,进一步包括:15. The method of claim 8, wherein, in response to the target field of the target data row being associated with a third data row, determining at least one other data row in the first data table that is associated with the third data row in the target field further comprises: 响应于确定所述目标字段允许关联多个所述第二数据行、所述第一数据表中所述目标字段已关联多个所述第二数据行的所述第一数据行的占比高于预设比例且所述第一数据表中的所述第一数据行的数量大于第五数量,在所述目标数据行的所述目标字段中已关联第三数据行时,确定所述第一数据表中在所述目标字段中关联所述第三数据行的至少一个其他数据行。In response to determining that the target field allows association with multiple second data rows, the proportion of the first data rows to which the target field in the first data table has been associated with multiple second data rows is higher than a preset proportion, and the number of the first data rows in the first data table is greater than a fifth number, when a third data row is associated with the target field of the target data row, determine at least one other data row in the first data table that is associated with the third data row in the target field. 16.如权利要求9所述的方法,还包括:16. The method of claim 9, further comprising: 响应于未能获取到所述至少一个第一筛选数据行、所述至少一个第二筛选数据行、所述至少一个第三待选数据行和所述至少一个第四待选数据行,将所述第二数据表中距离当前时间的第二预设时间段内的至少一个第二已编辑数据行或者距离当前时间最近的第六数量的第三已编辑数据行确定为所述目标数据行的所述推荐关联数据。In response to failure to obtain the at least one first filtered data row, the at least one second filtered data row, the at least one third selected data row and the at least one fourth selected data row, at least one second edited data row within a second preset time period from the current time or the sixth number of third edited data rows closest to the current time in the second data table are determined as the recommended associated data of the target data row. 17.一种表格数据处理装置,包括:17. A table data processing device, comprising: 第一确定模块,被配置为:响应于针对第一数据表中目标数据行的关联数据推荐请求,确定所述目标数据行的特征信息;A first determination module is configured to: determine characteristic information of a target data row in a first data table in response to a request for recommending associated data; 获取模块,被配置为:根据所述特征信息,在所述第一数据表中获取与所述特征信息匹配的至少一个第一数据行和/或在与所述第一数据表关联的第二数据表中获取与所述特征信息匹配的至少一个第二数据行;an acquisition module, configured to: acquire, according to the characteristic information, at least one first data row matching the characteristic information in the first data table and/or acquire at least one second data row matching the characteristic information in a second data table associated with the first data table; 第二确定模块,被配置为:根据与所述特征信息匹配的所述至少一个第一数据行和/或所述至少一个第二数据行,确定所述目标数据行的推荐关联数据;A second determination module is configured to: determine the recommended associated data of the target data row according to the at least one first data row and/or the at least one second data row matching the feature information; 生成模块,被配置为:根据所述推荐关联数据,生成所述目标数据行的关联数据推荐结果。The generating module is configured to generate a recommendation result of the associated data of the target data row according to the recommended associated data. 18.一种计算机设备,包括一个或者多个处理器、存储器;和一个或多个程序,其中所述一个或多个程序被存储在所述存储器中,并且被所述一个或多个处理器执行,所述一个或多个程序包括用于执行根据权利要求1至16任一项所述的方法的指令。18. A computer device comprising one or more processors, a memory; and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, and the one or more programs include instructions for executing the method according to any one of claims 1 to 16. 19.一种包含计算机程序的非易失性计算机可读存储介质,当所述计算机程序被一个或多个处理器执行时,使得所述一个或多个处理器执行权利要求1至16任一项所述的方法。19. A non-volatile computer-readable storage medium containing a computer program, which, when executed by one or more processors, causes the one or more processors to perform the method of any one of claims 1 to 16. 20.一种计算机程序产品,包括一个或者多个计算机程序,所述一个或者多个计算机程序被一个或多个处理器执行时实现如权利要求1至16任一项所述方法的步骤。20. A computer program product, comprising one or more computer programs, wherein the one or more computer programs implement the steps of the method according to any one of claims 1 to 16 when executed by one or more processors.
CN202510407757.0A 2025-04-01 2025-04-01 Table data processing method and device, computer equipment, storage medium, program product Pending CN120336628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510407757.0A CN120336628A (en) 2025-04-01 2025-04-01 Table data processing method and device, computer equipment, storage medium, program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510407757.0A CN120336628A (en) 2025-04-01 2025-04-01 Table data processing method and device, computer equipment, storage medium, program product

Publications (1)

Publication Number Publication Date
CN120336628A true CN120336628A (en) 2025-07-18

Family

ID=96364491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510407757.0A Pending CN120336628A (en) 2025-04-01 2025-04-01 Table data processing method and device, computer equipment, storage medium, program product

Country Status (1)

Country Link
CN (1) CN120336628A (en)

Similar Documents

Publication Publication Date Title
CN112052387B (en) Content recommendation method, device and computer readable storage medium
US9530075B2 (en) Presentation and organization of content
US10152479B1 (en) Selecting representative media items based on match information
WO2021003932A1 (en) File management method and apparatus, computer device and storage medium
US20200226133A1 (en) Knowledge map building system and method
US10540666B2 (en) Method and system for updating an intent space and estimating intent based on an intent space
US11080287B2 (en) Methods, systems and techniques for ranking blended content retrieved from multiple disparate content sources
US20130110803A1 (en) Search driven user interface for navigating content and usage analytics
US10089311B2 (en) Ad-hoc queries integrating usage analytics with search results
JP2021525433A (en) Predicting potentially relevant topics based on searched / created digital media files
US20250219979A1 (en) Method and system for dynamically generating a card
US9767400B2 (en) Method and system for generating a card based on intent
US10719492B1 (en) Automatic reconciliation and consolidation of disparate repositories
US20170046401A1 (en) System and Method for Monitoring Internet Activity
CN113010790A (en) Content recommendation method, device, server and storage medium
US11016869B2 (en) Extensibility model for usage analytics used with a system
WO2024255428A1 (en) Sample processing method for cross-domain recommendation model, sample processing apparatus for cross-domain recommendation model, and electronic device, computer storage medium and computer program product
CN113569132A (en) Information retrieval display method and system
CN113076396A (en) Entity relationship processing method and system oriented to man-machine cooperation
CN112288510A (en) Item recommendation method, device, equipment and storage medium
CN120336628A (en) Table data processing method and device, computer equipment, storage medium, program product
US11762928B2 (en) Feature recommendation based on user-generated content
US20160188603A1 (en) Quotation management platform
Arslan Search and Sort Algorithms for Big Data Structures
US20240303887A1 (en) Systems and methods for identifying a design template matching a media item

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination