CN118862879A - 一种基于深度学习的自动化数据标注方法及系统 - Google Patents
一种基于深度学习的自动化数据标注方法及系统 Download PDFInfo
- Publication number
- CN118862879A CN118862879A CN202410872145.4A CN202410872145A CN118862879A CN 118862879 A CN118862879 A CN 118862879A CN 202410872145 A CN202410872145 A CN 202410872145A CN 118862879 A CN118862879 A CN 118862879A
- Authority
- CN
- China
- Prior art keywords
- data
- model
- labeling
- text
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410872145.4A CN118862879B (zh) | 2024-07-01 | 2024-07-01 | 一种基于深度学习的自动化数据标注方法及系统 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410872145.4A CN118862879B (zh) | 2024-07-01 | 2024-07-01 | 一种基于深度学习的自动化数据标注方法及系统 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118862879A true CN118862879A (zh) | 2024-10-29 |
| CN118862879B CN118862879B (zh) | 2025-03-14 |
Family
ID=93157036
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410872145.4A Active CN118862879B (zh) | 2024-07-01 | 2024-07-01 | 一种基于深度学习的自动化数据标注方法及系统 |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118862879B (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119149919A (zh) * | 2024-11-15 | 2024-12-17 | 厦门两万里文化传媒有限公司 | 基于主动学习的标注数据质量评估方法 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111241243A (zh) * | 2020-01-13 | 2020-06-05 | 华中师范大学 | 面向知识测量的试题、知识、能力张量构建与标注方法 |
| US20220051083A1 (en) * | 2020-08-11 | 2022-02-17 | Nec Laboratories America, Inc. | Learning word representations via commonsense reasoning |
| US20230244987A1 (en) * | 2022-02-01 | 2023-08-03 | Capital One Services, Llc | Accelerated data labeling with automated data profiling for training machine learning predictive models |
| CN118036577A (zh) * | 2024-04-11 | 2024-05-14 | 一百分信息技术有限公司 | 一种自然语言处理中的序列标注方法 |
| CN118069785A (zh) * | 2024-02-26 | 2024-05-24 | 郑州大学 | 一种多特征融合冒犯性文本检测方法及装置 |
-
2024
- 2024-07-01 CN CN202410872145.4A patent/CN118862879B/zh active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111241243A (zh) * | 2020-01-13 | 2020-06-05 | 华中师范大学 | 面向知识测量的试题、知识、能力张量构建与标注方法 |
| US20220051083A1 (en) * | 2020-08-11 | 2022-02-17 | Nec Laboratories America, Inc. | Learning word representations via commonsense reasoning |
| US20230244987A1 (en) * | 2022-02-01 | 2023-08-03 | Capital One Services, Llc | Accelerated data labeling with automated data profiling for training machine learning predictive models |
| CN118069785A (zh) * | 2024-02-26 | 2024-05-24 | 郑州大学 | 一种多特征融合冒犯性文本检测方法及装置 |
| CN118036577A (zh) * | 2024-04-11 | 2024-05-14 | 一百分信息技术有限公司 | 一种自然语言处理中的序列标注方法 |
Non-Patent Citations (3)
| Title |
|---|
| YE, FAN: "Co-occurrence statistics-based global and local feature learning for graph networks", 《 SOFT COMPUTING》, 15 July 2023 (2023-07-15) * |
| 何彬;李心宇;陈蓓蕾;夏盟;曾致中;: "基于属性关系深度挖掘的试题知识点标注模型", 南京信息工程大学学报(自然科学版), no. 06, 28 November 2019 (2019-11-28) * |
| 陈航: "面向图数据分类的正例未标注学习算法研究及系统实现", 《中国优秀硕士学位论文全文数据库》, 16 December 2022 (2022-12-16) * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119149919A (zh) * | 2024-11-15 | 2024-12-17 | 厦门两万里文化传媒有限公司 | 基于主动学习的标注数据质量评估方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118862879B (zh) | 2025-03-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Asmussen et al. | Smart literature review: a practical topic modelling approach to exploratory literature review | |
| CN114548321B (zh) | 基于对比学习的自监督舆情评论观点对象分类方法 | |
| Ciurumelea et al. | Suggesting comment completions for python using neural language models | |
| CN117574858A (zh) | 一种基于大语言模型的类案检索报告自动生成方法 | |
| Somogyi | The application of artificial intelligence | |
| CN112181490A (zh) | 功能点评估法中功能类别的识别方法、装置、设备及介质 | |
| CN115146062A (zh) | 融合专家推荐与文本聚类的智能事件分析方法和系统 | |
| CN119026597A (zh) | 一种面向科技情报分析的科技情报源评估方法及装置 | |
| Xu et al. | A GitHub-based data collection method for software defect prediction | |
| CN119378494A (zh) | 一种面向金融领域知识图谱构建的实体关系抽取方法及系统 | |
| US20250245665A1 (en) | Fraud risk analysis system incorporating a large language model | |
| CN118862879B (zh) | 一种基于深度学习的自动化数据标注方法及系统 | |
| CN119942206A (zh) | 影像异常状态分类方法、装置、计算机设备及存储介质 | |
| Bernhard-Harrer et al. | Beyond standardization: A comprehensive review of topic modeling validation methods for computational social science research | |
| CN120337938A (zh) | 敏感信息识别方法、装置、设备、存储介质及程序产品 | |
| CN114911928A (zh) | 一种长文本自动分类推荐方法及其装置 | |
| CN120216744A (zh) | 一种基于知识蒸馏的烟草舆情智能监测分析方法 | |
| CN118350368B (zh) | 一种基于nlp技术的大语言模型的多文档摘编方法 | |
| CN118467765B (zh) | 提高跨模态图像检索模型泛化能力的方法、装置及介质 | |
| Zhu et al. | Detecting authorship between generative AI models and humans: a Burrows’s Delta approach | |
| CN117851860A (zh) | 一种自动生成数据分类分级模板的方法 | |
| CN117875706A (zh) | 一种基于ai的评级工艺数字化管理方法 | |
| CN117010397A (zh) | 基于高效指针网络面向矛盾调解文本的命名实体识别方法 | |
| CN117332787A (zh) | 一种基于文本聚类语义云的可视化文本数据分类方法 | |
| Hrín | Methods for investigating the external and internal validity of machine learned signals |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| TA01 | Transfer of patent application right |
Effective date of registration: 20250221 Address after: Building C5, 11th Floor, R&D Room 2, Rongke Zhigu Industrial Project (Phase III), No. 555 Wenhua Avenue, Hongshan District, Wuhan City, Hubei Province 430074 Applicant after: Jiajie Technology Co.,Ltd. Country or region after: China Address before: Room 306, 3rd Floor, No. 138 Fengtai South Road, Yuhuatai District, Nanjing City, Jiangsu Province, 210012 Applicant before: Juming Data (Nanjing) Co.,Ltd. Country or region before: China |
|
| TA01 | Transfer of patent application right | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CP03 | Change of name, title or address |
Address after: 430000 Hubei Province, Wuhan City, Wuchang District, Yuntai Road No. 22, Lijiang Longcheng Building 1, 5th Floor Patentee after: Jiajie Technology Co.,Ltd. Country or region after: China Address before: Building C5, 11th Floor, R&D Room 2, Rongke Zhigu Industrial Project (Phase III), No. 555 Wenhua Avenue, Hongshan District, Wuhan City, Hubei Province 430074 Patentee before: Jiajie Technology Co.,Ltd. Country or region before: China |