[go: up one dir, main page]

CN120578805A - Internet trade data search and processing method and system - Google Patents

Internet trade data search and processing method and system

Info

Publication number
CN120578805A
CN120578805A CN202510664281.9A CN202510664281A CN120578805A CN 120578805 A CN120578805 A CN 120578805A CN 202510664281 A CN202510664281 A CN 202510664281A CN 120578805 A CN120578805 A CN 120578805A
Authority
CN
China
Prior art keywords
data
trade
index
transaction
supplier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510664281.9A
Other languages
Chinese (zh)
Inventor
杨之语
卢麒琦
佟健
梁瑶
马宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiamusi University
Original Assignee
Jiamusi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiamusi University filed Critical Jiamusi University
Priority to CN202510664281.9A priority Critical patent/CN120578805A/en
Publication of CN120578805A publication Critical patent/CN120578805A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for searching and processing Internet trade data includes collecting data from global trade website, electronic commerce platform, customs database and social media channel, cleaning the data by machine learning algorithm to construct unified multilingual language meaning index library, constructing multi-layer index system, indexing key words based on reverse index technology, indexing entity based on knowledge map, building time sequence index according to trade time, understanding user's search intention by deep learning, providing screening and sorting functions based on multiple dimensions of price and market demand, recommending potential suppliers or trade opportunities, storing trade data by block chain, providing visual report based on analysis of big data, and displaying trade activity intensity of different areas according to geographic thermodynamic diagram. The invention provides more comprehensive and detailed trade activity analysis by combining the data of a plurality of dimensions of price, market demand and transaction frequency, and more accurately reflects market dynamics.

Description

Internet trade data search processing method and system
Technical Field
The invention belongs to the technical field of big data analysis, and particularly relates to a method and a system for searching and processing internet trade data.
Background
Many current trade data analysis methods focus on a single dimension, such as transaction amount, transaction frequency, etc., and neglect multi-dimensional comprehensive analysis, especially the combination of geographic information. Conventional approaches often fail to reflect dynamic changes in the market and regional trade activity strengths in real time. Modern trade data comes from different platforms, different languages and different forms, and integrating data from different sources becomes a big problem. The prior art often lacks efficient mechanisms to process and fuse these multi-source heterogeneous data. The existing geographic information visualization technology, such as geographic thermodynamic diagrams, can intuitively display regional data distribution, but most of the existing geographic information visualization technology can only display static historical data and cannot reflect real-time market dynamics. In addition, existing GIS systems are inefficient in processing large-scale trade data. Most of the prior geographic information systems and trade data analysis platforms cannot update or support dynamic geographic thermodynamic diagram display in real time, and cannot reflect the influence of market changes and emergencies in time.
The traditional prediction model has insufficient precision, and the existing prediction model, such as a time series model based on linear regression or ARIMA, generally has lower prediction precision, and particularly cannot effectively capture complex dynamics and potential trading opportunities of the market when processing large-scale and multidimensional trade data. The ability to integrate multidimensional data is lacking, that existing predictive models can generally only predict based on a limited single data source (e.g., price, demand, etc.), and the ability to integrate analysis of multiple dimensions (e.g., supply chain stability, market demand fluctuations, regional policy changes, etc.).
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.
The present invention has been made in view of the above-mentioned or existing problems with the internet trade data search processing method and system.
In order to solve the technical problems, the invention provides the following technical scheme:
The embodiment of the invention provides an internet trade data searching and processing method, which comprises the steps of collecting data from a global trade website, an electronic commerce platform, a customs database and a social media channel, cleaning the data by using a machine learning algorithm, translating and standardizing the data in different languages, and constructing a unified multilingual language meaning index library;
constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge graph, establishing a time sequence index according to transaction time, and updating the index in real time by adopting a streaming data processing framework;
Understanding the user search intention by deep learning, providing screening and sorting functions based on multiple dimensions of price, market demand, supply chain stability and transaction frequency, and recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies;
Storing trade data by using a blockchain, and automatically verifying the qualification of the supplier by using an intelligent contract;
visual reports are provided based on big data analysis, trade activity intensity of different areas is displayed according to a geographic thermodynamic diagram, and regional trade opportunity prediction is supported.
The invention relates to an optimal scheme of an internet trade data searching and processing method, wherein the method comprises the steps of cleaning data by using a machine learning algorithm, translating and standardizing data in different languages, and constructing a unified multilingual semantic index library, and the method comprises the following steps:
detecting abnormal price, weight and transaction amount by using a machine learning algorithm, finding data points with abnormal concentration, and calculating by abnormal scores:
Wherein S (x) is an anomaly score, μ is a data mean, σ is a standard deviation;
Duplicate data were tested using Jaccard similarity:
if J (A, B) >0.85, then consider duplicate data, remove low quality version;
spelling and grammar correction are performed by using a deep learning model, an inverted index and a semantic index are constructed, text vectors are stored by FAISS and BERTembedding, and near-nearest acceleration cross-language search is performed.
The invention relates to an optimal scheme of an internet trade data searching processing method, wherein the method for constructing a multi-layer index system and indexing keywords based on an inverted index technology comprises the following steps:
The keyword weight is calculated by TF-IDF:
Where N is the total number of documents and df t is the number of documents containing the word;
keyword indexes are stored in the elastomer search, and are ranked using BM25 at query time:
Wherein D is a document to be retrieved, Q is a keyword set of a user query, t is a certain keyword in the query, and k and b are adjustment parameters.
The invention relates to an optimal scheme of an internet trade data searching and processing method, wherein the method comprises the steps of establishing a time sequence index according to transaction time based on a knowledge graph index entity, updating the index in real time by adopting a streaming data processing framework, and comprises the following steps:
Extracting company, product, supplier, customer, trade place and trade amount from original trade data, using natural language processing and named entity identification technology to identify information, using graph database to store relationship data, adopting clustering algorithm to combine similar entities and resolving spelling difference of same entity in different data sources;
establishing a time sequence index, time-stamping transaction data, establishing a time window index, and optimizing query performance;
And connecting a data source, receiving new trade data in real time, using Kafka or RabbitMQ as data transfer, analyzing the new data, and identifying newly added or updated entities and transaction records.
The invention provides a screening and sorting function based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency by utilizing deep learning to understand user search intention, and recommends potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technology, comprising:
acquiring historical quotations of suppliers, predicting and analyzing price fluctuation by adopting LSTM time sequence, and calculating the optimal price suppliers:
wherein P s is a supplier quotation, deltaP s is a price change trend of the past 6 months, and alpha is a price sensitivity parameter;
analyzing market demand trend based on global trade data, predicting sales volume change by ARIMA, and recommending high-demand product suppliers preferentially if market demand increases;
calculating similar suppliers by adopting collaborative filtering based on articles, and mining a high-frequency transaction mode based on an Apriori algorithm:
If a provider a frequently transacts with B, then B is recommended as a potential partner.
As a preferable scheme of the internet trade data searching processing method, the method for storing trade data by using a blockchain comprises the following steps of:
The method comprises the steps of adopting Merkle tree structure storage, adopting IPFS distributed storage for trade data, storing a data hash value into a blockchain, calling an intelligent contract to submit authentication information when a provider registers, enabling the provider to submit the authentication information, enabling the blockchain to generate a unique identity mark, and automatically updating the state of the provider by the intelligent contract after authentication is passed.
The invention provides a visual report based on big data analysis, which displays trade activity intensity of different areas according to a geographic thermodynamic diagram, supports regional trade opportunity prediction, and comprises the following steps:
The method comprises the steps of calculating trade activity intensity for each area, analyzing fluctuation and trend of trade activities of each area in different time periods, visualizing the trade activity intensity of different areas through thermodynamic diagrams, visualizing geographic data through GeoPandas, carrying out short-term prediction according to trend and seasonal change of historical data, predicting trade demands of long time spans through a deep learning model, dividing the areas into different clusters, identifying areas with active trade activities, and predicting future trade opportunities of the areas according to multiple dimensions.
The Internet trade data searching and processing system comprises a collecting and preprocessing module, a processing module and a processing module, wherein the collecting and preprocessing module is used for collecting data from a global trade website, an electronic commerce platform, a customs database and a social media channel, cleaning the data by using a machine learning algorithm, translating and standardizing the data in different languages and constructing a unified multilingual language meaning index library;
The index construction module is used for constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge graph, establishing a time sequence index according to transaction time, and updating the index in real time by adopting a streaming data processing framework;
the search optimization module is used for understanding the search intention of the user by utilizing deep learning, providing screening and sorting functions based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency, and recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies;
the certification and verification module is used for storing trade data by adopting a blockchain and automatically verifying the qualification of the supplier by utilizing an intelligent contract;
And the visualization and analysis module is used for providing a visual report based on big data analysis, displaying the trade activity intensity of different areas according to the geographic thermodynamic diagram and supporting the regional trade opportunity prediction.
A computing device, the computing device comprising:
At least one processor, memory, and input output unit;
wherein the memory is used for storing a computer program, and the processor is used for calling the computer program stored in the memory to execute the steps of the internet trade data searching processing method.
A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the steps of an internet trade data search processing method.
The method has the beneficial effects that by combining data of multiple dimensions such as price, market demand, supply chain stability, transaction frequency and the like, more comprehensive and fine trade activity analysis can be provided, and compared with a traditional single-dimension analysis method, market dynamics can be reflected more accurately. By combining geographic information with trade data, intuitive regional thermodynamic and trend graphs are formed that can help users more easily identify the strength of trade activity and potential market opportunities for different regions. By adopting the streaming data processing framework, large-scale trade data can be updated and processed in real time, so that the analysis result always reflects the latest dynamic state of the market.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an internet trade data search processing method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an internet trade data search processing system according to an embodiment of the present invention.
Fig. 3 schematically shows a schematic structural diagram of a medium according to an embodiment of the present invention.
FIG. 4 schematically illustrates a structural diagram of a computing device in accordance with embodiments of the present invention.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Examples
Referring now to fig. 1, fig. 1 is a flowchart illustrating a method for searching and processing internet trade data according to an embodiment of the present invention. It should be noted that embodiments of the present invention may be applied to any scenario where applicable.
The flow of the internet trade data searching processing method provided by the embodiment of the invention shown in fig. 1 comprises the following steps:
S1, collecting data from global trade websites, electronic commerce platforms, customs databases and social media channels, cleaning the data by using a machine learning algorithm, translating and standardizing the data in different languages, and constructing a unified multilingual language meaning index library.
Preferably, machine learning algorithms are used to detect abnormal price, weight and transaction amount, find data points with abnormal concentration, and calculate by anomaly score:
Wherein S (x) is an anomaly score, μ is a data mean, σ is a standard deviation;
Duplicate data were tested using Jaccard similarity:
if J (A, B) >0.85, then consider duplicate data, remove low quality version;
spelling and grammar correction are performed by using a deep learning model, an inverted index and a semantic index are constructed, text vectors are stored by FAISS and BERTembedding, and near-nearest acceleration cross-language search is performed.
Further, assume that there is a data set containing a plurality of trade transaction records, wherein each record includes a transaction price, weight, and amount. The data sets were as follows:
Calculating data mean and standard deviation:
The mean and standard deviation were calculated for price, weight and amount, respectively.
For example, the mean μ price of the price=170, the standard deviation σ price=60, for example, for transaction T003, Z-Score is:
S (3000) =47.17, and since Z-Score is much larger than 3, T003 is determined as abnormal data.
It is assumed that two commodity descriptions, which are calculated as Jaccard similarity by taking them as a set, J (a, B) =0.6 because Jaccard similarity is less than 0.85, are stored as data records a and B, and therefore, these two records are not considered to be duplicated.
And S2, constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge graph, establishing a time sequence index according to transaction time, and updating the index in real time by adopting a streaming data processing framework.
Preferably, the keyword weights are calculated using TF-IDF:
Where N is the total number of documents and df t is the number of documents containing the word;
keyword indexes are stored in the elastomer search, and are ranked using BM25 at query time:
Wherein D is a document to be retrieved, Q is a keyword set of a user query, t is a certain keyword in the query, and k and b are adjustment parameters.
Preferably, the company, product, supplier, customer, transaction location and transaction amount are extracted from the raw trade data,
Identifying information by using natural language processing and named entity identification technology, storing relational data by using a graph database, merging similar entities by adopting a clustering algorithm, and solving spelling differences of the same entities in different data sources;
establishing a time sequence index, time-stamping transaction data, establishing a time window index, and optimizing query performance;
And connecting a data source, receiving new trade data in real time, using Kafka or RabbitMQ as data transfer, analyzing the new data, and identifying newly added or updated entities and transaction records.
Further, it is assumed that a document set exists, and document 1 is "apple phone 64GB black". In the document, the number of occurrences of the "mobile phone" is 1, and the total word number of the document is 4.TF ("cell phone", D1) =0.25;
Assuming that the document set has 3 documents, the "handset" appears in all 3 documents (i.e., dft ("handset") =3, total number of documents n=3), IDF ("handset") =log (1) =0, since the IDF value is 0, meaning that the amount of information in this document set by the "handset" is very low, without distinction, combining TF and IDF, assuming that TF-IDF of "handset" is calculated in document 1: TF-IDF ("handset", D1) =0.25·0=0.
In the elastic search, an index containing documents is first created. Each document contains a plurality of fields:
the product document data is inserted into the elastiscearch index. The text field in each document is subjected to word segmentation processing, and index information is stored:
POST/products/_doc/1
{
"name": "apple phone 64GB black",
Description is that the apple phone is 64GB black, which is suitable for various applications "
}
The elastomer search automatically splits text fields in the document into words and generates a corresponding index for each word. At query time, the elastiscearch will calculate the weight of the word based on the word frequency and the inverse document frequency.
Further, neo4j is selected as the graph database for storing the extracted entities and their relationships. Each node represents an entity. The entities are connected by relationships such as "vendor A" supplies "product X" and "customer X" purchases "product Y".
The method comprises the steps of creating nodes of different types, namely storing information such as company names, addresses and industries, storing information such as product names, descriptions and prices, storing information such as provider names, addresses and supply capacity, storing information such as customer names and purchase histories, creating a graph relation according to a relation of data extraction, namely creating a relation of 'provider A', 'provider X', 'customer X', 'purchase', 'product Y', 'transaction', 'occurring in the New York', storing transaction records as relations, representing connection among the nodes, and connecting the transaction data to related nodes through time stamps, wherein the transaction occurs at a certain time point.
A time stamp is added to each transaction record indicating the specific time at which the transaction occurred. According to the transaction time, the transaction data are divided into different time windows for storage. This may speed up time-based queries, such as querying all transaction records for a month. In the elastiscearch, a time sequence index may be created using a time stamp field. During inquiry, the inquiry performance can be improved by screening in a time range.
And optimizing the query performance by using the time window index and the slicing strategy.
And S3, understanding the search intention of the user by utilizing deep learning, providing screening and sorting functions based on multiple dimensions of price, market demand, supply chain stability and transaction frequency, and recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies.
Preferably, historical offers of suppliers are obtained, price fluctuations are predicted and analyzed by LSTM time series, and the optimal price supplier is calculated:
wherein P s is a supplier quotation, deltaP s is a price change trend of the past 6 months, and alpha is a price sensitivity parameter;
analyzing market demand trend based on global trade data, predicting sales volume change by ARIMA, and recommending high-demand product suppliers preferentially if market demand increases;
calculating similar suppliers by adopting collaborative filtering based on articles, and mining a high-frequency transaction mode based on an Apriori algorithm:
If a provider a frequently transacts with B, then B is recommended as a potential partner.
Further, the system obtains supplier quotation data including supplier ID, product ID, quotation date, quotation amount and transaction amount, predicts future price by using LSTM model, price change trend (ΔPs), market demand parameter (D) of past 6 months, price of 3 months in future, data windowing, converting time series data into fixed length input, price of past 6 months as input, price of next month, training set proportion 80%, test set 20%;
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dense
construction of LSTM model #
model=Sequential([
LSTM(50,activation='relu',return_sequences=True,input_shape=(6,1)),
LSTM(50,activation='relu'),
Dense(1)
])
model.compile(optimizer='adam',loss='mse')
Training model #
model.fit(X_train,y_train,epochs=50,batch_size=16,validation_data=(X_test,y_test))
Predicting future market demands according to historical trading volume, outputting future sales volume predictions of 3 months according to a trading volume time sequence, and adopting ADF (automatic frequency correction) for inspection:
from statsmodels.tsa.stattools import adfuller
result=adfuller(sales_data)
print ("P-value:", result [1 ])#P0.05 illustrates that data is stable
If the data is not stable, differential processing is adopted:
sales_data_diff=sales_data.diff().dropna()
determining AR and MA orders:
from statsmodels.tsa.arima.model import ARIMA
model=ARIMA(sales_data,order=(2,1,2))
model_fit=model.fit()
forecast=model_fit.forecast(steps=3)
print ('future 3 months forecast sales:', forecast)
If market demand grows (Δd > 0) for 3 months in the future, suppliers of high sales products are preferably recommended:
recommended_supplier=supplier_with_high_sales。
And S4, storing trade data by using a blockchain, and automatically verifying the qualification of the supplier by using an intelligent contract.
Preferably, merkle tree structure storage is adopted, trade data is stored in IPFS distributed mode, data hash values are stored in a blockchain, when a supplier registers, intelligent contracts are called to submit authentication information, the supplier submits the authentication information, the blockchain generates unique identity identification, and after authentication is passed, the intelligent contracts automatically update the state of the supplier.
Further, the vendor registers the smart contract:
trade data storage intelligence contract:
transaction data is uploaded to IPFS:
Computing Merkle tree root hash:
and S5, providing a visual report based on big data analysis, displaying the trade activity intensity of different areas according to the geographic thermodynamic diagram, and supporting the regional trade opportunity prediction.
Preferably, the trade activity intensity is calculated for each area, fluctuation and trend of the trade activity of each area in different time periods are analyzed, the trade activity intensity of different areas is visualized through thermodynamic diagrams, geographic data is visualized through GeoPandas, short-term prediction is carried out according to trend and seasonal change of historical data, trade demands of long time span are predicted through a deep learning model, the areas are divided into different clusters, the areas with active trade activities are identified, and future trade opportunities of the areas are predicted according to multiple dimensions.
Further, the trade total amount, the trade times and the import and export amount are calculated through comprehensive weights, and the trade activity intensity index of each area is obtained, so that the trade activity degree of the area is quantized.
The index is calculated as follows:
Ti=w 1 × Σ transaction amount+w 2 × number of Σ transactions+w 3 × amount of Σ import/export
Where w 1,w2,w3 is a weight that can be optimized by historical data;
the trade growth rate of daily average, zhou Jun, monthly was calculated and seasonal trends and periodic wave patterns were found.
The method comprises the steps of carrying out short-term trend prediction by adopting an autoregressive integral moving average model, predicting trade changes of 1-6 months in the future, predicting trade demand changes of 1-2 years in the future by adopting a long-short-term memory network, combining market economic data, policy adjustment information and consumption trend, improving prediction accuracy, dividing regions into different categories by adopting a K-means clustering algorithm, identifying trade active regions, predicting regions with higher future growth potential by adopting a XGBoost machine learning model, and optimizing market layout.
Having described the method of the exemplary embodiment of the present invention, a packet loss error correction system based on a sliding window according to the exemplary embodiment of the present invention will be described with reference to fig. 2, and the system includes:
the collecting and preprocessing module is used for collecting data from global trade websites, electronic commerce platforms, customs databases and social media channels, cleaning the data by using a machine learning algorithm, translating and standardizing the data in different languages, and constructing a unified multilingual language meaning index library;
The index construction module is used for constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge graph, establishing a time sequence index according to transaction time, and updating the index in real time by adopting a streaming data processing framework;
the search optimization module is used for understanding the search intention of the user by utilizing deep learning, providing screening and sorting functions based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency, and recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies;
the certification and verification module is used for storing trade data by adopting a blockchain and automatically verifying the qualification of the supplier by utilizing an intelligent contract;
And the visualization and analysis module is used for providing a visual report based on big data analysis, displaying the trade activity intensity of different areas according to the geographic thermodynamic diagram and supporting the regional trade opportunity prediction.
Having described the method and apparatus of the exemplary embodiments of the present invention, reference is next made to fig. 3 for a description of a computer readable storage medium of the exemplary embodiments of the present invention, and referring to fig. 3, the computer readable storage medium is shown as an optical disc 30, on which is stored a computer program (i.e., a program product) that, when executed by a processor, implements the steps described in the above method embodiments, for example, collecting data from a global trade website, an electronic commerce platform, a customs database, and social media channels, cleaning the data using a machine learning algorithm, translating and normalizing the data in different languages, and constructing a unified multilingual speech-meaning index library; the method comprises the steps of constructing a multi-layer index system, indexing keywords based on an inverted index technology, establishing a time sequence index according to transaction time based on a knowledge map index entity, updating the index in real time by adopting a streaming data processing framework, understanding user search intention by deep learning, providing screening and sorting functions based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency, recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommending and association rule mining technologies, storing trade data by adopting a blockchain, automatically verifying supplier qualification by utilizing intelligent contracts, providing a visual report based on big data analysis, displaying trade activity intensity of different areas according to geographic thermodynamic diagrams, and supporting regional trade opportunity prediction, wherein specific implementation modes of the steps are not repeated here.
It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.
Having described the methods, apparatus and media of exemplary embodiments of the present invention, next, a computing device of an internet trade data search process of exemplary embodiments of the present invention is described with reference to fig. 4.
FIG. 4 illustrates a block diagram of an exemplary computing device 40 suitable for use in implementing embodiments of the invention, the computing device 40 may be a computer system or a server. The computing device 40 shown in fig. 4 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present invention.
As shown in FIG. 4, components of computing device 40 may include, but are not limited to, one or more processors or processing units 401, a system memory 402, and a bus 403 that connects the different system components (including system memory 402 and processing units 401).
Computing device 40 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computing device 40 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 4021 and/or cache memory 4022. Computing device 40 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM4023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4 and commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media), may be provided. In such cases, each drive may be coupled to bus 403 through one or more data medium interfaces. The system memory 402 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.
A program/utility 4025 having a set (at least one) of program modules 4024 may be stored in, for example, system memory 402, and such program modules 4024 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 4024 generally perform the functions and/or methodologies of the described embodiments of the present invention.
Computing device 40 may also communicate with one or more external devices 404 (e.g., keyboard, pointing device, display, etc.). Such communication may occur through an input/output (I/O) interface 405. Moreover, computing device 40 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 406. As shown in fig. 4, network adapter 406 communicates with other modules of computing device 40, such as processing unit 401, etc., over bus 403. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in connection with computing device 40.
The processing unit 401 performs various functional applications and data processing by running programs stored in the system memory 402, for example, collecting data from global trade websites, electronic commerce platforms, customs databases and social media channels, cleaning the data using machine learning algorithms, translating and standardizing data of different languages, constructing a unified multilingual semantic index library, constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge map, establishing a time sequence index according to transaction time, updating the index in real time by using a streaming data processing framework, understanding user search intention by using deep learning, providing screening and sorting functions based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency, recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies, storing trade data by using a block chain, automatically verifying supplier qualification by using intelligent contracts, providing a visual report based on large data analysis, displaying trade activity intensity of different areas, and supporting regional trade opportunity prediction.
The specific implementation of each step is not repeated here. It should be noted that while several units/modules or sub-units/sub-modules of a multi-commodity flow based synchronous escape routing apparatus are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
In the description of the present invention, it should be noted that the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
It should be noted that the foregoing embodiments are merely illustrative embodiments of the present invention, and not restrictive, and the scope of the invention is not limited to the embodiments, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features of the embodiments described in the foregoing embodiments may be easily contemplated within the scope of the present invention, and the spirit and scope of the technical solutions of the embodiments do not depart from the spirit and scope of the embodiments of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Furthermore, although the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not required or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Claims (10)

1.一种互联网贸易数据搜索处理方法,其特征在于,包括:1. A method for searching and processing Internet trade data, comprising: 从全球贸易网站、电商平台、海关数据库以及社交媒体渠道采集数据,利用机器学习算法对数据进行清理,对不同语言的数据进行翻译与标准化,构建统一的多语言语义索引库;Collect data from global trade websites, e-commerce platforms, customs databases, and social media channels, use machine learning algorithms to clean the data, translate and standardize data in different languages, and build a unified multilingual semantic index library; 构建多层索引体系,基于倒排索引技术索引关键词,基于知识图谱索引实体,按照交易时间建立时序索引,采用流式数据处理框架实时更新索引;Build a multi-layer indexing system, index keywords based on inverted index technology, index entities based on knowledge graphs, establish a time series index based on transaction time, and use a streaming data processing framework to update the index in real time; 利用深度学习理解用户搜索意图,提供基于价格、市场需求、供应链稳定性和交易频次多个维度的筛选和排序功能,结合协同过滤、内容推荐和关联规则挖掘技术,推荐潜在供应商或贸易机会;Leveraging deep learning to understand user search intent, it provides filtering and sorting capabilities based on multiple dimensions, including price, market demand, supply chain stability, and transaction frequency. It also combines collaborative filtering, content recommendation, and association rule mining techniques to recommend potential suppliers or trading opportunities. 采用区块链存储贸易数据,利用智能合约自动验证供应商资质;Use blockchain to store trade data and use smart contracts to automatically verify supplier qualifications; 基于大数据分析提供视化报告,根据地理热力图展示不同区域的贸易活动强度,支持区域贸易机会预测。Provides visual reports based on big data analysis, displays the intensity of trade activities in different regions according to geographic heat maps, and supports regional trade opportunity forecasts. 2.如权利要求1所述的互联网贸易数据搜索处理方法,其特征在于,所述利用机器学习算法对数据进行清理,对不同语言的数据进行翻译与标准化,构建统一的多语言语义索引库,包括:2. The Internet trade data search and processing method according to claim 1, wherein the step of using a machine learning algorithm to clean the data, translate and standardize data in different languages, and construct a unified multilingual semantic index library comprises: 利用机器学习算法检测异常价格、重量和交易金额,发现密集度异常的数据点,并通过异常评分计算:Machine learning algorithms are used to detect abnormal prices, weights, and transaction amounts, discover data points with abnormal density, and calculate anomaly scores: 其中,S(x)是异常分数,μ是数据均值,σ是标准差;Where S(x) is the anomaly score, μ is the data mean, and σ is the standard deviation; 采用Jaccard相似度检测重复数据:Use Jaccard similarity to detect duplicate data: 若J(A,B)>0.85,则认为是重复数据,去除低质量版本;If J(A,B)>0.85, it is considered as duplicate data and the low-quality version is removed; 使用深度学习模型进行拼写和语法纠错,构建倒排索引和语义索引,使FAISS和BERTembedding存储文本向量,近似最近邻加速跨语言搜索。Use deep learning models for spelling and grammar correction, build inverted indexes and semantic indexes, enable FAISS and BERTembedding to store text vectors, and approximate nearest neighbors to accelerate cross-language searches. 3.如权利要求1所述的互联网贸易数据搜索处理方法,其特征在于,所述构建多层索引体系,基于倒排索引技术索引关键词,包括:3. The Internet trade data search and processing method according to claim 1, wherein the step of constructing a multi-layer index system and indexing keywords based on an inverted index technique comprises: 关键词权重采用TF-IDF计算:Keyword weights are calculated using TF-IDF: 其中,N是文档总数,dft是包含该词的文档数;Where N is the total number of documents and df t is the number of documents containing the word; 关键词索引存储在Elasticsearch中,查询时使用BM25进行排序:The keyword index is stored in Elasticsearch and is sorted using BM25 when querying: 其中,D是待检索的文档,Q是用户查询的关键词集合,t是查询中的某个关键词,k和b是调节参数。Among them, D is the document to be retrieved, Q is the keyword set of the user query, t is a keyword in the query, and k and b are adjustment parameters. 4.如权利要求1所述的互联网贸易数据搜索处理方法,其特征在于,所述基于知识图谱索引实体,按照交易时间建立时序索引,采用流式数据处理框架实时更新索引,包括:4. The internet trade data search and processing method according to claim 1, wherein the knowledge graph-based indexing entity establishes a time series index according to transaction time, and the streaming data processing framework is used to update the index in real time, comprising: 从原始贸易数据中抽取公司、产品、供应商、客户、交易地点和交易金额,使用自然语言处理和命名实体识别技术识别信息,使用图数据库存储关系数据,采用聚类算法合并相似实体,解决不同数据源中相同实体的拼写差异;Extract companies, products, suppliers, customers, transaction locations, and transaction amounts from raw trade data. Use natural language processing and named entity recognition to identify this information. Use graph databases to store relational data. Use clustering algorithms to merge similar entities and resolve spelling differences between entities in different data sources. 建立时间序列索引,将交易数据时间戳化,建立时间窗口索引,优化查询性能;Establish a time series index, timestamp transaction data, establish a time window index, and optimize query performance; 连接数据源,实时接收新的贸易数据,使用Kafka或RabbitMQ作为数据中转,解析新数据,识别新增或更新的实体和交易记录。Connect to data sources, receive new trade data in real time, use Kafka or RabbitMQ as a data transit, parse the new data, and identify new or updated entities and transaction records. 5.如权利要求1所述的互联网贸易数据搜索处理方法,其特征在于,所述利用深度学习理解用户搜索意图,提供基于价格、市场需求、供应链稳定性和交易频次多个维度的筛选和排序功能,结合协同过滤、内容推荐和关联规则挖掘技术,推荐潜在供应商或贸易机会,包括:5. The internet trade data search and processing method according to claim 1, characterized in that the method utilizes deep learning to understand user search intent, provides screening and sorting functions based on multiple dimensions such as price, market demand, supply chain stability, and transaction frequency, and combines collaborative filtering, content recommendation, and association rule mining techniques to recommend potential suppliers or trade opportunities, including: 获取供应商的历史报价,采用LSTM时间序列预测分析价格波动,计算最优价格供应商:Get historical quotes from suppliers, use LSTM time series prediction to analyze price fluctuations, and calculate the supplier with the best price: 其中,Ps为供应商报价,ΔPs为过去6个月的价格变化趋势,α为价格敏感度参数;Where Ps is the supplier's quotation, ΔPs is the price change trend in the past 6 months, and α is the price sensitivity parameter; 基于全球贸易数据分析市场需求趋势,采用ARIMA预测销量变化,若市场需求增长,则优先推荐高需求产品供应商;Analyze market demand trends based on global trade data and use ARIMA to predict sales changes. If market demand increases, prioritize suppliers of high-demand products. 采用基于物品的协同过滤计算相似供应商,基于Apriori算法挖掘高频交易模式:Use item-based collaborative filtering to calculate similar suppliers and use the Apriori algorithm to mine high-frequency trading patterns: 若某供应商A经常与B交易,则推荐B作为潜在合作伙伴。If supplier A frequently trades with supplier B, then supplier B is recommended as a potential partner. 6.如权利要求1所述的互联网贸易数据搜索处理方法,其特征在于,所述采用区块链存储贸易数据,利用智能合约自动验证供应商资质,包括:6. The Internet trade data search and processing method according to claim 1, wherein the use of blockchain to store trade data and the use of smart contracts to automatically verify supplier qualifications include: 采用Merkle树结构存储,贸易数据采用IPFS分布式存储,存储数据哈希值到区块链,供应商注册时,调用智能合约提交认证信息,供应商提交认证信息,区块链生成唯一身份标识,认证通过后,智能合约自动更新供应商状态。The Merkle tree structure is used for storage, and trade data is distributedly stored using IPFS. The data hash value is stored in the blockchain. When a supplier registers, the smart contract is called to submit authentication information. After the supplier submits the authentication information, the blockchain generates a unique identity. After the authentication is passed, the smart contract automatically updates the supplier status. 7.如权利要求1所述的互联网贸易数据搜索处理方法,其特征在于,所述基于大数据分析提供视化报告,根据地理热力图展示不同区域的贸易活动强度,支持区域贸易机会预测,包括:7. The internet trade data search and processing method according to claim 1, wherein providing a visualization report based on big data analysis, displaying the intensity of trade activities in different regions according to a geographic heat map, and supporting regional trade opportunity forecasting, comprises: 为每个区域计算贸易活动强度,分析各区域的交易活动在不同时间段的波动和趋势;将不同区域的贸易活动强度通过热力图可视化,使用GeoPandas进行地理数据的可视化,根据历史数据的趋势、季节性变化进行短期预测,通过深度学习模型预测长时间跨度的贸易需求;将区域划分为不同的集群,识别贸易活动活跃的区域,根据多个维度预测区域的未来贸易机会。Calculate the intensity of trade activity for each region and analyze the fluctuations and trends of trading activities in each region over different time periods; visualize the intensity of trade activity in different regions through heat maps, use GeoPandas to visualize geographic data, make short-term forecasts based on trends and seasonal changes in historical data, and predict long-term trade demand through deep learning models; divide regions into different clusters, identify areas with active trade activities, and predict future trade opportunities in regions based on multiple dimensions. 8.一种互联网贸易数据搜索处理系统,其特征在于,包括:8. An Internet trade data search and processing system, comprising: 收集与预处理模块,用于从全球贸易网站、电商平台、海关数据库以及社交媒体渠道采集数据,利用机器学习算法对数据进行清理,对不同语言的数据进行翻译与标准化,构建统一的多语言语义索引库;The collection and preprocessing module collects data from global trade websites, e-commerce platforms, customs databases, and social media channels, cleans the data using machine learning algorithms, translates and standardizes data in different languages, and builds a unified multilingual semantic index library; 索引构建模块,用于构建多层索引体系,基于倒排索引技术索引关键词,基于知识图谱索引实体,按照交易时间建立时序索引,采用流式数据处理框架实时更新索引;The index construction module is used to build a multi-layer index system. It indexes keywords based on inverted index technology, indexes entities based on knowledge graphs, establishes time series indexes based on transaction time, and uses a streaming data processing framework to update the index in real time. 搜索优化模块,用于利用深度学习理解用户搜索意图,提供基于价格、市场需求、供应链稳定性和交易频次多个维度的筛选和排序功能,结合协同过滤、内容推荐和关联规则挖掘技术,推荐潜在供应商或贸易机会;The search optimization module uses deep learning to understand user search intent and provides filtering and sorting capabilities based on multiple dimensions such as price, market demand, supply chain stability, and transaction frequency. It also combines collaborative filtering, content recommendation, and association rule mining techniques to recommend potential suppliers or trading opportunities. 存证与验证模块,用于采用区块链存储贸易数据,利用智能合约自动验证供应商资质;The evidence storage and verification module is used to store trade data using blockchain and automatically verify supplier qualifications using smart contracts; 可视化与分析模块,用于基于大数据分析提供视化报告,根据地理热力图展示不同区域的贸易活动强度,支持区域贸易机会预测。The visualization and analysis module is used to provide visual reports based on big data analysis, display the intensity of trade activities in different regions according to geographic heat maps, and support regional trade opportunity forecasts. 9.一种计算设备,所述计算设备包括:9. A computing device, comprising: 至少一个处理器、存储器和输入输出单元;at least one processor, memory, and input-output unit; 其中,所述存储器用于存储计算机程序,所述处理器用于调用所述存储器中存储的计算机程序来执行如权利要求1~7中任一项所述的互联网贸易数据搜索处理方法的步骤。The memory is used to store a computer program, and the processor is used to call the computer program stored in the memory to execute the steps of the Internet trade data search and processing method according to any one of claims 1 to 7. 10.一种计算机可读存储介质,其包括指令,当其在计算机上运行时,使得计算机执行如权利要求1~7中的任一项所述的互联网贸易数据搜索处理方法的步骤。10. A computer-readable storage medium comprising instructions, which, when executed on a computer, enables the computer to execute the steps of the Internet trade data search and processing method according to any one of claims 1 to 7.
CN202510664281.9A 2025-05-22 2025-05-22 Internet trade data search and processing method and system Pending CN120578805A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510664281.9A CN120578805A (en) 2025-05-22 2025-05-22 Internet trade data search and processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510664281.9A CN120578805A (en) 2025-05-22 2025-05-22 Internet trade data search and processing method and system

Publications (1)

Publication Number Publication Date
CN120578805A true CN120578805A (en) 2025-09-02

Family

ID=96862022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510664281.9A Pending CN120578805A (en) 2025-05-22 2025-05-22 Internet trade data search and processing method and system

Country Status (1)

Country Link
CN (1) CN120578805A (en)

Similar Documents

Publication Publication Date Title
Gong et al. A survey on dataset quality in machine learning
US11663254B2 (en) System and engine for seeded clustering of news events
US20190347282A1 (en) Technology incident management platform
US11514096B2 (en) Natural language processing for entity resolution
US8140515B2 (en) Personalization engine for building a user profile
US20190384745A1 (en) Systems and Methods for Management of Data Platforms
US8990241B2 (en) System and method for recommending queries related to trending topics based on a received query
US10262283B2 (en) Methods and systems for generating supply chain representations
US20120203584A1 (en) System and method for identifying potential customers
WO2008144444A1 (en) Ranking online advertisements using product and seller reputation
Irudeen et al. Big data solution for Sri Lankan development: A case study from travel and tourism
CN116561134B (en) Business rule processing method, device, equipment and storage medium
CA2956627A1 (en) System and engine for seeded clustering of news events
Yao et al. Using social media information to predict the credit risk of listed enterprises in the supply chain
Du et al. An iterative reinforcement approach for fine-grained opinion mining
EP2384476A1 (en) Personalization engine for building a user profile
US10719561B2 (en) System and method for analyzing popularity of one or more user defined topics among the big data
US8126790B2 (en) System for cost-sensitive autonomous information retrieval and extraction
CN120578805A (en) Internet trade data search and processing method and system
Costa et al. Predicting macroeconomic indicators from online activity data: A review
Kelly News, sentiment and financial markets: A computational system to evaluate the influence of text sentiment on financial assets
US20080103882A1 (en) Method for cost-sensitive autonomous information retrieval and extraction
CN120508660B (en) Financial article recommendation method and device and readable storage medium
Ling et al. TMP: Meta-path based recommendation on time-weighted heterogeneous information networks
Han et al. Transportation index computation: A development theme mining-based approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination