[go: up one dir, main page]

CN114090901A - Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device - Google Patents

Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device Download PDF

Info

Publication number
CN114090901A
CN114090901A CN202111367617.3A CN202111367617A CN114090901A CN 114090901 A CN114090901 A CN 114090901A CN 202111367617 A CN202111367617 A CN 202111367617A CN 114090901 A CN114090901 A CN 114090901A
Authority
CN
China
Prior art keywords
commodity
value
commodities
picture
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111367617.3A
Other languages
Chinese (zh)
Other versions
CN114090901B (en
Inventor
李斌
丁建伟
刘志洁
李航
陈周国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN202111367617.3A priority Critical patent/CN114090901B/en
Publication of CN114090901A publication Critical patent/CN114090901A/en
Application granted granted Critical
Publication of CN114090901B publication Critical patent/CN114090901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Finance (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种基于多模融合特征的暗网相似商品判定方法、存储介质及计算装置,所述方法包括:步骤10,采集暗网商品数据并对商品分类;其中,采集的暗网商品数据包括商品文字和商品图片,并对采集的商品图片生成md5值;步骤20,基于步骤1中采集的暗网商品数据以及商品分类结果,计算商品图片的感知哈希指纹值以及商品文字的Word2Vec句向量值;步骤30,基于商品图片的md5值、商品图片的感知哈希指纹值以及商品文字的Word2Vec句向量值,对商品相似度进行计算。本发明通过构建一种融合商品图片和商品文本的相似度计算方法,能够解决暗网商品图片模糊,文字信息简单,相似商品的判定存在困难的问题。

Figure 202111367617

The present invention provides a method, a storage medium and a computing device for judging similar commodities on the dark web based on multi-mode fusion features. The method includes: step 10, collecting commodity data on the dark web and classifying the commodities; wherein the collected commodity data on the dark web Including product text and product images, and generating md5 values for the collected product images; step 20, based on the dark web product data collected in step 1 and the product classification results, calculate the perceptual hash fingerprint value of the product image and the Word2Vec sentence of the product text. vector value; Step 30, calculate the similarity of the products based on the md5 value of the product image, the perceptual hash fingerprint value of the product image, and the Word2Vec sentence vector value of the product text. By constructing a similarity calculation method integrating commodity pictures and commodity texts, the invention can solve the problems of blurred commodity pictures on the dark web, simple text information and difficulty in judging similar commodities.

Figure 202111367617

Description

Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device
Technical Field
The invention relates to the technical field of dark net similar commodity judgment, in particular to a dark net similar commodity judgment method, a storage medium and a computing device based on multimode fusion characteristics.
Background
The darknet market (or "darknet") is a commercial website that specializes in illegal commodity transactions. They are accessible through the darknet (e.g., Tor) and differ from the open e-commerce web site in specialization, technology, and primary support. Most markets are designed to promote the transaction between the buyer and the seller of illegal goods, but the dark web markets have a large number of sellers, and many released goods are extremely similar or even identical. In order to better monitor the dynamic state of dark net market transaction and master the timely information of various commodities, the information of site commodities and the like in the dark net market is required to be collected as much as possible, more work is to classify and count various commodities, filter similar commodities, find new commodities and give early warning. Therefore, the determination of similar products is extremely necessary.
At present, because the picture of the commodity of the open-web electronic commerce website is clear and high and the description of the characters is detailed, the judgment of the similar commodity can be completed basically by means of single picture similarity or character similarity. In addition, similar goods are judged to be recommended to the buyer users by using a collaborative filtering algorithm. On the contrary, the dark net commodity has fuzzy pictures, simple text information and difficult judgment of similar commodities, and the method is rarely available on the market.
Disclosure of Invention
The invention aims to provide a dark net similar commodity judgment method, a storage medium and a computing device based on multimode fusion characteristics, and aims to solve the problems that dark net commodity images are fuzzy, text information is simple, and similar commodities are difficult to judge.
The invention provides a dark net similar commodity judgment method based on multimode fusion characteristics, which comprises the following steps:
step 10, collecting dark net commodity data and classifying commodities; the collected dark net commodity data comprise commodity characters and commodity pictures, and an md5 value is generated for the collected commodity pictures;
step 20, calculating a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of the commodity text based on the dark web commodity data and the commodity classification result acquired in the step 1;
and step 30, calculating the similarity of the commodities based on the md5 value of the commodity picture, the perceptual hash fingerprint value of the commodity picture and the Word2Vec sentence vector value of the commodity text.
Further, step 10 comprises the following sub-steps:
step 11, constructing a customized acquisition strategy aiming at a data typesetting format and a reverse-crawling mechanism of a dark net target site, and realizing acquisition of dark net commodity data, wherein the acquired dark net commodity data comprises structured commodity characters and commodity pictures of a commodity detail page; the commodity characters comprise a commodity id, a commodity name and a commodity description;
step 12, for the commodity with the commodity picture, acquiring the commodity picture, simultaneously acquiring an md5 value of the commodity picture by using a general md5 calculation method, taking the md5 value as the name of the commodity picture, storing the commodity picture in a Seaweed database according to a set storage position, and generating a corresponding storage address string;
and step 13, classifying the commodities, adding a secondary commodity label, and storing the secondary commodity label, the collected commodity text, the md5 value of the commodity picture and the storage address string in an ES database.
Further, step 20 comprises the following sub-steps:
step 21, reading an md5 value containing a commodity id, a commodity name, commodity characters described by the commodity and a commodity picture from an ES database, and acquiring the commodity picture from a Seaweed database according to a corresponding storage address string for the commodity with the md5 value not being empty;
step 22, combining the commodity id, the commodity name and the commodity description as complete commodity characters;
step 23, calculating a perceptual hash fingerprint value of the commodity picture;
step 24, calculating the vector value of Word2Vec sentence of the commodity characters;
and step 25, storing the data obtained in the steps 21, 23 and 24 into a MySQL commodity feature vector table.
Further, step 30 comprises the following sub-steps:
step 31, reading a commodity id of a new commodity, recording the commodity id as id1, reading a picture md5 value of the new commodity as md5_1, reading a perceptual hash fingerprint value of the commodity picture of the new commodity, recording the perceptual hash fingerprint value as h1, reading a Word2Vec sentence vector value of a commodity character of the new commodity, recording the vector value as v1, and reading a secondary commodity label of the new commodity, recording the secondary commodity label as c and t;
step 32, reading the attribute values of the commodities, which are the same as the first-level and second-level commodity labels of the new commodities, from the MySQL commodity feature vector table, wherein the attribute values also comprise a commodity id, an md5 value of a commodity picture, a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of a commodity text:
step 33, selecting one commodity which does not participate in comparison in sequence from the commodities corresponding to the attribute values of the commodities obtained in the step 32, setting the commodity id to be id2, the md5 value of the commodity picture to be md5_2, the perceptual hash fingerprint value of the commodity picture to be h2, and the Word2Vec sentence vector value of the commodity text to be v 2;
step 34, if md5_1 is md5_2, setting two commodity similarity s to 1;
step 35, if the condition in step 34 is not satisfied, then:
(1) calculating a hamming distance d of the perceived hash fingerprint values of the commodity pictures of the two commodities as hamming _ dist (h1, h 2);
(2) calculating cosine similarity c ═ cos _ similarity of vector values of Word2Vec sentences of the commodity texts of the two commodities (v1, v 2);
(3) setting the similarity of two commodities as (1/ln (e + d/10) + c)/2, wherein e is a natural logarithm;
step 36, comparing the similarity s of the two commodities solved in step 34 or 35 with a preset similarity threshold lambda, and screening the two commodities with the similarity s being larger than or equal to the lambda value;
step 37, merging the commodity ids and the similarity s of the two commodities meeting the condition of the step 36 into a triple (id1, id2, s) and storing the triple (id1, id2, s) into a MySQL commodity similarity table;
and step 38, returning to step 33 until all the commodities are compared.
The invention also provides a computer terminal storage medium which stores computer terminal executable instructions, and the computer terminal executable instructions are characterized in that the computer terminal executable instructions are used for executing the dark net similar commodity judgment method based on the multimode fusion characteristics.
The present invention also provides a computing apparatus, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the above dark web similar goods judging method based on the multi-mode fusion feature.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. according to the method for calculating the similarity of the commodity pictures and the commodity texts, the similarity calculation of the dark net market and the commodity can be realized, the similar commodities under various categories can be obtained, the dark net market commodities can be better classified, the judgment accuracy of the commodity similarity can be improved, the mode is simple, the interpretability is strong, and therefore the problems that the dark net commodity pictures are fuzzy, the character information is simple, and the judgment of the similar commodities is difficult are solved.
2. According to the invention, through dark net data acquisition, commodity picture characteristic calculation, commodity text characteristic calculation and similarity calculation, the buying and selling of new commodities can be effectively monitored, real-time early warning is realized, and the dark net market dynamics can be better tracked.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a general flowchart of a dark web similar commodity determination method based on multimode fusion characteristics according to an embodiment of the present invention.
Fig. 2 is a flowchart of step 10 in the dark web similar product determination method based on the multimode fusion characteristic according to the embodiment of the present invention.
Fig. 3 is a flowchart of step 20 in the dark web similar product determination method based on the multi-mode fusion feature according to the embodiment of the present invention.
Fig. 4 is a flowchart of step 30 in the dark web similar product determination method based on the multi-mode fusion feature according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
As shown in fig. 1, the present embodiment provides a dark web similar product determination method based on multi-mode fusion features, including the following steps:
step 10, collecting dark net commodity data and classifying commodities; the acquired darknet commodity data comprise commodity characters and commodity pictures, and an md5 value is generated for the acquired commodity pictures; the method mainly comprises the steps of adopting a dark net data acquisition technology to acquire structural data of commodity characters including commodity id, commodity names, commodity descriptions and the like, and acquiring corresponding commodity pictures for commodities with commodity pictures at the same time. And furthermore, an md5 value of the commodity picture is generated, the commodity is classified, the structured data is stored in an ES database, and the commodity picture is stored in a Seaweed database. As shown in fig. 2, step 10 comprises the following sub-steps:
step 11, constructing a customized acquisition strategy aiming at a data typesetting format and a reverse-crawling mechanism of a dark net target site, and realizing acquisition of dark net commodity data, wherein the acquired dark net commodity data comprises structured commodity characters and commodity pictures of a commodity detail page; the commodity characters comprise commodity id, commodity names and commodity descriptions;
step 12, for the commodities with the commodity pictures, acquiring the md5 values of the commodity pictures by using a general md5 calculation method while acquiring the commodity pictures, storing the md5 values as names of the commodity pictures in a Seaweed database according to a set storage position, and generating corresponding storage address strings;
and step 13, classifying the commodities, adding a secondary commodity label, and storing the secondary commodity label, the collected commodity text, the md5 value of the commodity picture and the storage address string in an ES database.
Step 20, calculating a perceptual hash fingerprint value of a commodity picture and a Word2Vec sentence vector value of a commodity text based on the dark web commodity data and the commodity classification result acquired in the step 1; the method mainly comprises the steps of respectively calculating a perception hash fingerprint value of a commodity picture and a Word2Vec sentence vector value of commodity words based on commodity characters including a commodity id, a commodity name and a commodity description, a commodity picture, an md5 value of the commodity picture and a commodity classification result acquired in the step 1, and finally storing the characteristic values and basic information of the commodity into a MySQL commodity characteristic vector table. As shown in fig. 3, step 20 includes the steps of:
step 21, reading an md5 value containing a commodity id, a commodity name, commodity characters described by the commodity and a commodity picture from an ES database, and acquiring the commodity picture from a Seaweed database according to a corresponding storage address string for the commodity with the md5 value not being empty;
step 22, combining the commodity id, the commodity name and the commodity description as complete commodity characters;
step 23, calculating a perceptual hash fingerprint value of the commodity picture; the method for calculating the perceptual hash fingerprint value is the prior art, and is not described herein again.
Step 24, calculating the vector value of Word2Vec sentence of the commodity characters; the method for calculating the vector value of Word2Vec sentence is prior art and is not described herein again.
And step 25, storing the data obtained in the steps 21, 23 and 24 into a MySQL commodity feature vector table.
And step 30, calculating the similarity of the commodities based on the md5 value of the commodity picture, the perceptual hash fingerprint value of the commodity picture and the Word2Vec sentence vector value of the commodity text. The method mainly comprises the steps of calculating the similarity of the commodities by using a similarity calculation method based on a Hamming distance and a similarity calculation method based on a cosine similarity respectively based on the data obtained in the step 20, such as an md5 value of the commodity picture, a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of the commodity characters, and storing the commodity ids and the similarities of the two commodities with the similarities larger than a preset similarity threshold in a MySQL commodity similarity table. As shown in fig. 4, step 30 comprises the following sub-steps:
step 31, reading a commodity id of a new commodity (namely a newly collected commodity) and marking the commodity id as id1, reading a picture md5 value of the new commodity and marking the picture md5_1, reading a perceptual hash fingerprint value of the commodity picture of the new commodity and marking the value as h1, reading a Word2Vec sentence vector value of a commodity character of the new commodity and marking the vector value as v1, and reading a secondary commodity label of the new commodity and marking the vector value as c and t;
step 32, reading the attribute values of the commodities, which are the same as the first-level and second-level commodity labels of the new commodities, from the MySQL commodity feature vector table, wherein the attribute values also comprise a commodity id, an md5 value of a commodity picture, a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of a commodity text:
step 33, sequentially selecting one commodity which does not participate in comparison from the commodities corresponding to the attribute values of the commodities obtained in the step 32, setting the commodity id to be id2, the md5 value of the commodity picture to be md5_2, the perceptual hash fingerprint value of the commodity picture to be h2, and the Word2Vec sentence vector value of the commodity text to be v 2;
step 34, if md5_1 is md5_2, setting two commodity similarity s to 1;
step 35, if the condition in step 34 does not hold, then:
(1) calculating the hamming distance d of the perceived hash fingerprint values of the two commodity pictures (h1, h 2);
(2) calculating cosine similarity c ═ cos _ similarity (v1, v2) of Word2Vec sentence vector values of the two commodities;
(3) setting the similarity of two commodities as (1/ln (e + d/10) + c)/2, wherein e is a natural logarithm;
step 36, comparing the similarity s of the two commodities solved in step 34 or 35 with a preset similarity threshold lambda, and screening the two commodities with the similarity s being larger than or equal to the lambda value;
step 37, merging the commodity ids and the similarity s of the two commodities meeting the condition of the step 36 into a triple (id1, id2, s) and storing the triple in a MySQL commodity similarity table;
and step 38, returning to step 33 until all the commodities are compared.
The dark net similar commodity judgment is completed through the dark net similar commodity judgment method based on the multimode fusion characteristics. The method comprises the following steps:
(1) by constructing the similarity calculation method fusing the commodity picture and the commodity text, the similarity calculation of the dark net market with the commodity can be realized, similar commodities under various categories can be obtained, better classification of the dark net market commodities is facilitated, the judgment accuracy of the commodity similarity can be improved, the mode is simple, the interpretability is strong, and therefore the problems that the dark net commodity picture is fuzzy, the character information is simple, and the judgment of the similar commodities is difficult are solved.
(2) By means of dark net data acquisition, commodity picture characteristic calculation, commodity text characteristic calculation and similarity calculation, the buying and selling of new commodities can be effectively monitored, real-time early warning is achieved, and dark net market dynamics are tracked better.
In addition, in some embodiments, a computer terminal storage medium is provided, which stores computer terminal executable instructions, where the computer terminal executable instructions are configured to execute the dark web similar goods determination method based on the multi-mode fusion feature as described in the foregoing embodiments. Examples of the computer storage medium include a magnetic storage medium (e.g., a floppy disk, a hard disk, etc.), an optical recording medium (e.g., a CD-ROM, a DVD, etc.), or a memory such as a memory card, a ROM, a RAM, or the like. The computer storage media may also be distributed over a network-connected computer system, such as an application store.
Furthermore, in some embodiments, a computing device is presented, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for determining a dark web similar goods based on multi-mode fusion features as described in the previous embodiments. Examples of computing devices include PCs, tablets, smart phones, or PDAs, among others.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A dark net similar commodity judgment method based on multimode fusion characteristics is characterized by comprising the following steps:
step 10, collecting dark net commodity data and classifying commodities; the collected dark net commodity data comprise commodity characters and commodity pictures, and an md5 value is generated for the collected commodity pictures;
step 20, calculating a perceptual hash fingerprint value of a commodity picture and a Word2Vec sentence vector value of a commodity text based on the dark web commodity data and the commodity classification result acquired in the step 1;
and step 30, calculating the similarity of the commodities based on the md5 value of the commodity picture, the perceptual hash fingerprint value of the commodity picture and the Word2Vec sentence vector value of the commodity text.
2. The dark net similar commodity judgment method based on the multimode fusion characteristic as claimed in claim 1, wherein the step 10 comprises the following substeps:
step 11, constructing a customized acquisition strategy aiming at a data typesetting format and a reverse-crawling mechanism of a dark net target site, and realizing acquisition of dark net commodity data, wherein the acquired dark net commodity data comprises structured commodity characters and commodity pictures of a commodity detail page; the commodity characters comprise commodity id, commodity names and commodity descriptions;
step 12, for the commodity with the commodity picture, acquiring the commodity picture, simultaneously acquiring an md5 value of the commodity picture by using a general md5 calculation method, taking the md5 value as the name of the commodity picture, storing the commodity picture in a Seaweed database according to a set storage position, and generating a corresponding storage address string;
and step 13, classifying the commodities, adding a secondary commodity label, and storing the secondary commodity label, the collected commodity text, the md5 value of the commodity picture and the storage address string in an ES database.
3. The dark net similar commodity judgment method based on the multimode fusion characteristic as claimed in claim 2, wherein the step 20 comprises the following substeps:
step 21, reading an md5 value containing a commodity id, a commodity name, commodity characters described by the commodity and a commodity picture from an ES database, and acquiring the commodity picture from a Seaweed database according to a corresponding storage address string for the commodity with the md5 value not being empty;
step 22, combining the commodity id, the commodity name and the commodity description as complete commodity characters;
step 23, calculating a perceptual hash fingerprint value of the commodity picture;
step 24, calculating the vector value of Word2Vec sentence of the commodity characters;
and step 25, storing the data obtained in the steps 21, 23 and 24 into a MySQL commodity feature vector table.
4. The dark net similar commodity judgment method based on the multimode fusion characteristic as claimed in claim 3, wherein the step 30 comprises the following substeps:
step 31, reading a commodity id of a new commodity, recording the commodity id as id1, reading a picture md5 value of the new commodity as md5_1, reading a perceptual hash fingerprint value of the commodity picture of the new commodity, recording the perceptual hash fingerprint value as h1, reading a Word2Vec sentence vector value of a commodity character of the new commodity, recording the vector value as v1, and reading a secondary commodity label of the new commodity, recording the secondary commodity label as c and t;
step 32, reading the attribute values of the commodities, which are the same as the first-level and second-level commodity labels of the new commodities, from the MySQL commodity feature vector table, wherein the attribute values also comprise a commodity id, an md5 value of a commodity picture, a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of a commodity text:
step 33, sequentially selecting one commodity which does not participate in comparison from the commodities corresponding to the attribute values of the commodities obtained in the step 32, setting the commodity id to be id2, the md5 value of the commodity picture to be md5_2, the perceptual hash fingerprint value of the commodity picture to be h2, and the Word2Vec sentence vector value of the commodity text to be v 2;
step 34, if md5_1 is md5_2, setting two commodity similarity s to 1;
step 35, if the condition in step 34 is not satisfied, then:
(1) calculating a hamming distance d of the perceived hash fingerprint values of the commodity pictures of the two commodities as hamming _ dist (h1, h 2);
(2) calculating cosine similarity c ═ cos _ similarity (v1, v2) of Word2Vec sentence vector values of the two commodities;
(3) setting the similarity of two commodities as (1/ln (e + d/10) + c)/2, wherein e is a natural logarithm;
step 36, comparing the similarity s of the two commodities solved in step 34 or 35 with a preset similarity threshold lambda, and screening the two commodities with the similarity s being larger than or equal to the lambda value;
step 37, merging the commodity ids and the similarity s of the two commodities meeting the condition of the step 36 into a triple (id1, id2, s) and storing the triple (id1, id2, s) into a MySQL commodity similarity table;
and step 38, returning to step 33 until all the commodities are compared.
5. A computer terminal storage medium storing computer terminal-executable instructions for performing the method for determining dark web similar goods based on multi-mode fusion features according to any one of claims 1 to 4.
6. A computing device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for determining a darknet similar goods based on multimodal fusion characteristics as claimed in any one of claims 1 to 4.
CN202111367617.3A 2021-11-18 2021-11-18 A method, storage medium and computing device for determining similar dark web products based on multi-mode fusion features Active CN114090901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111367617.3A CN114090901B (en) 2021-11-18 2021-11-18 A method, storage medium and computing device for determining similar dark web products based on multi-mode fusion features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111367617.3A CN114090901B (en) 2021-11-18 2021-11-18 A method, storage medium and computing device for determining similar dark web products based on multi-mode fusion features

Publications (2)

Publication Number Publication Date
CN114090901A true CN114090901A (en) 2022-02-25
CN114090901B CN114090901B (en) 2025-06-13

Family

ID=80301542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111367617.3A Active CN114090901B (en) 2021-11-18 2021-11-18 A method, storage medium and computing device for determining similar dark web products based on multi-mode fusion features

Country Status (1)

Country Link
CN (1) CN114090901B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798517A (en) * 2023-02-08 2023-03-14 南京邮电大学 Commodity search method and system based on voice information feature data
CN116016999A (en) * 2022-12-12 2023-04-25 北京爱奇艺科技有限公司 A method, device, electronic device, and readable storage medium for determining the same product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824904A (en) * 2016-03-15 2016-08-03 浙江大学 Chinese herbal medicine plant picture capturing method based on professional term vector of traditional Chinese medicine and pharmacy field
US20170185675A1 (en) * 2014-05-27 2017-06-29 Telefonaktiebolaget Lm Ericsson (Publ) Fingerprinting and matching of content of a multi-media file
CN110472002A (en) * 2019-08-14 2019-11-19 腾讯科技(深圳)有限公司 A kind of text similarity acquisition methods and device
CN112084448A (en) * 2020-08-31 2020-12-15 北京金堤征信服务有限公司 Similar information processing method and device
CN112633363A (en) * 2020-12-21 2021-04-09 上海明略人工智能(集团)有限公司 Commodity feature similarity calculation method and system
CN113641846A (en) * 2021-08-12 2021-11-12 中国石油大学(华东) A Cross-modal Retrieval Model Based on Strong Representation Deep Hashing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170185675A1 (en) * 2014-05-27 2017-06-29 Telefonaktiebolaget Lm Ericsson (Publ) Fingerprinting and matching of content of a multi-media file
CN105824904A (en) * 2016-03-15 2016-08-03 浙江大学 Chinese herbal medicine plant picture capturing method based on professional term vector of traditional Chinese medicine and pharmacy field
CN110472002A (en) * 2019-08-14 2019-11-19 腾讯科技(深圳)有限公司 A kind of text similarity acquisition methods and device
CN112084448A (en) * 2020-08-31 2020-12-15 北京金堤征信服务有限公司 Similar information processing method and device
CN112633363A (en) * 2020-12-21 2021-04-09 上海明略人工智能(集团)有限公司 Commodity feature similarity calculation method and system
CN113641846A (en) * 2021-08-12 2021-11-12 中国石油大学(华东) A Cross-modal Retrieval Model Based on Strong Representation Deep Hashing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DI WANG等: "Robust and flexible discrete hashing for cross-modal similarity search", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 》, vol. 28, no. 10, 4 July 2017 (2017-07-04), pages 2703, XP011701916, DOI: 10.1109/TCSVT.2017.2723302 *
崔彤彤: "基于主题和语义指纹融合的学术论文粗分类方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 December 2018 (2018-12-15), pages 138 - 1913 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116016999A (en) * 2022-12-12 2023-04-25 北京爱奇艺科技有限公司 A method, device, electronic device, and readable storage medium for determining the same product
CN115798517A (en) * 2023-02-08 2023-03-14 南京邮电大学 Commodity search method and system based on voice information feature data

Also Published As

Publication number Publication date
CN114090901B (en) 2025-06-13

Similar Documents

Publication Publication Date Title
US10565498B1 (en) Deep neural network-based relationship analysis with multi-feature token model
CN106919619B (en) Commodity clustering method and device and electronic equipment
James Pattern recognition
CN110543592B (en) Information searching method and device and computer equipment
US11797530B1 (en) Artificial intelligence system for translation-less similarity analysis in multi-language contexts
WO2022156525A1 (en) Object matching method and apparatus, and device
CN108664637B (en) Retrieval method and system
CN115564469A (en) Advertisement creative selection and model training method, device, equipment and storage medium
Baluja Learning typographic style: from discrimination to synthesis
Huffer et al. Fleshing out the bones: Studying the human remains trade with Tensorflow and Inception
CN114090901A (en) Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device
CN111652144B (en) Question segmentation methods, devices, equipment and media based on target area fusion
CN118193806B (en) Target retrieval method, target retrieval device, electronic equipment and storage medium
CN118036584A (en) Intelligent bidding document generation method and system
CN113297472A (en) Method and device for releasing video content and commodity object information and electronic equipment
CN115955596A (en) Method, apparatus, device and medium for providing video-related information
Mohsin et al. Convolutional neural networks for real-time wood plank detection and defect segmentation
CN114239569B (en) Method and device for analyzing evaluation text, and computer-readable storage medium
Alshehri An Online Fake Review Detection Approach Using Famous Machine Learning Algorithms.
Thwe et al. A semi-supervised learning approach for automatic detection and fashion product category prediction with small training dataset using FC-YOLOv4
JP7709948B2 (en) Information processing device, information processing method, and program
CN119003721A (en) Video file generation method and device and electronic equipment
CN113127597A (en) Processing method and device for search information and electronic equipment
CN113836906B (en) Tender generation method, device and server
Karim et al. Classification of Google Play Store Application Reviews Using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant