MX2016003981A - Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo. - Google Patents
Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo.Info
- Publication number
- MX2016003981A MX2016003981A MX2016003981A MX2016003981A MX2016003981A MX 2016003981 A MX2016003981 A MX 2016003981A MX 2016003981 A MX2016003981 A MX 2016003981A MX 2016003981 A MX2016003981 A MX 2016003981A MX 2016003981 A MX2016003981 A MX 2016003981A
- Authority
- MX
- Mexico
- Prior art keywords
- sample
- classifier
- term
- training
- characteristic
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/268—Morphological analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/416—Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
En la descripción se proveen método y dispositivo para capacitar clasificador y reconocimiento de tipo, que pertenecen al campo de proceso de lenguaje natural. El método para capacitar el clasificador incluye extraer una cláusula de muestra que incluye una palabra clave objetivo de la información de muestra; que lleva a cabo marcación binaria en la cláusula de muestra para obtener el conjunto de capacitación de muestra con base en si cada cláusula de la muestra pertenece a la clase objetivo; que lleva a cabo la segmentación de palabras en cada cláusula de muestra en el conjunto de capacitación de muestras para obtener una pluralidad de palabras; que extrae un conjunto de características específico de la pluralidad de palabras, el conjunto de características específico que incluye por lo menos una de las palabras características; que construye un clasificador con base en las palabras características en el conjunto de características específico; capacitación del clasificador con base en resultados de la marcación binaria en el conjunto de capacitación de muestras. Dado que las palabras características en el conjunto de características específico se extrajeron al llevar a cabo la segmentación de palabras en la cláusula de muestras que incluye la palabra clave objetivo, el clasificador puede predecir con precisión la cláusulas que incluye la palabra clave objetivo, y por lo tanto logra los resultados de reconocimiento precisos. Figura 1.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510511468.1A CN105117384A (zh) | 2015-08-19 | 2015-08-19 | 分类器训练方法、类型识别方法及装置 |
| PCT/CN2015/097615 WO2017028416A1 (zh) | 2015-08-19 | 2015-12-16 | 分类器训练方法、类型识别方法及装置 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| MX2016003981A true MX2016003981A (es) | 2017-04-27 |
Family
ID=54665378
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| MX2016003981A MX2016003981A (es) | 2015-08-19 | 2015-12-16 | Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo. |
Country Status (8)
| Country | Link |
|---|---|
| US (1) | US20170052947A1 (es) |
| EP (1) | EP3133532A1 (es) |
| JP (1) | JP2017535007A (es) |
| KR (1) | KR101778784B1 (es) |
| CN (1) | CN105117384A (es) |
| MX (1) | MX2016003981A (es) |
| RU (1) | RU2643500C2 (es) |
| WO (1) | WO2017028416A1 (es) |
Families Citing this family (26)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105117384A (zh) * | 2015-08-19 | 2015-12-02 | 小米科技有限责任公司 | 分类器训练方法、类型识别方法及装置 |
| CN106060000B (zh) * | 2016-05-06 | 2020-02-07 | 青岛海信移动通信技术股份有限公司 | 一种识别验证信息的方法和设备 |
| CN106211165B (zh) * | 2016-06-14 | 2020-04-21 | 北京奇虎科技有限公司 | 检测外文骚扰短信的方法、装置及相应的客户端 |
| CN107135494B (zh) * | 2017-04-24 | 2020-06-19 | 北京小米移动软件有限公司 | 垃圾短信识别方法及装置 |
| CN110444199B (zh) * | 2017-05-27 | 2022-01-07 | 腾讯科技(深圳)有限公司 | 一种语音关键词识别方法、装置、终端及服务器 |
| CN110019782B (zh) * | 2017-09-26 | 2021-11-02 | 北京京东尚科信息技术有限公司 | 用于输出文本类别的方法和装置 |
| CN107704892B (zh) * | 2017-11-07 | 2019-05-17 | 宁波爱信诺航天信息有限公司 | 一种基于贝叶斯模型的商品编码分类方法以及系统 |
| US10726204B2 (en) | 2018-05-24 | 2020-07-28 | International Business Machines Corporation | Training data expansion for natural language classification |
| CN109325123B (zh) * | 2018-09-29 | 2020-10-16 | 武汉斗鱼网络科技有限公司 | 基于补集特征的贝叶斯文档分类方法、装置、设备及介质 |
| US11100287B2 (en) * | 2018-10-30 | 2021-08-24 | International Business Machines Corporation | Classification engine for learning properties of words and multi-word expressions |
| CN109979440B (zh) * | 2019-03-13 | 2021-05-11 | 广州市网星信息技术有限公司 | 关键词样本确定方法、语音识别方法、装置、设备和介质 |
| CN109992771B (zh) * | 2019-03-13 | 2020-05-05 | 北京三快在线科技有限公司 | 一种文本生成的方法及装置 |
| CN110083835A (zh) * | 2019-04-24 | 2019-08-02 | 北京邮电大学 | 一种基于图和词句协同的关键词提取方法及装置 |
| CN111339297B (zh) * | 2020-02-21 | 2023-04-25 | 广州天懋信息系统股份有限公司 | 网络资产异常检测方法、系统、介质和设备 |
| CN113688436A (zh) * | 2020-05-19 | 2021-11-23 | 天津大学 | 一种pca与朴素贝叶斯分类融合的硬件木马检测方法 |
| CN112529623B (zh) * | 2020-12-14 | 2023-07-11 | 中国联合网络通信集团有限公司 | 恶意用户的识别方法、装置和设备 |
| CN112925958A (zh) * | 2021-02-05 | 2021-06-08 | 深圳力维智联技术有限公司 | 多源异构数据适配方法、装置、设备及可读存储介质 |
| CN114969239A (zh) * | 2021-02-27 | 2022-08-30 | 北京紫冬认知科技有限公司 | 病例数据的处理方法、装置、电子设备及存储介质 |
| CN114281983B (zh) * | 2021-04-05 | 2024-04-12 | 北京智慧星光信息技术有限公司 | 分层结构的文本分类方法、系统、电子设备和存储介质 |
| CN113570269B (zh) * | 2021-08-03 | 2024-10-18 | 工银科技有限公司 | 运维项目的管理方法、装置、设备、介质和程序产品 |
| CN113705818B (zh) * | 2021-08-31 | 2024-04-19 | 支付宝(杭州)信息技术有限公司 | 对支付指标波动进行归因的方法及装置 |
| CN114706991B (zh) * | 2022-01-27 | 2025-08-05 | 清华大学 | 一种知识网络构建方法、装置、设备及存储介质 |
| CN116094886B (zh) * | 2023-03-09 | 2023-08-25 | 浙江万胜智能科技股份有限公司 | 一种双模模块中载波通信数据处理方法及系统 |
| CN116467604A (zh) * | 2023-04-27 | 2023-07-21 | 中国工商银行股份有限公司 | 对话状态识别方法、装置、计算机设备和存储介质 |
| CN116894216A (zh) * | 2023-07-19 | 2023-10-17 | 中国工商银行股份有限公司 | 服务器硬件告警类别的确定方法、装置及电子设备 |
| CN117910875B (zh) * | 2024-01-22 | 2024-07-19 | 青海省科技发展服务中心 | 一种披碱草属资源抗逆性评价系统 |
Family Cites Families (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH11203318A (ja) * | 1998-01-19 | 1999-07-30 | Seiko Epson Corp | 文書分類方法および装置並びに文書分類処理プログラムを記録した記録媒体 |
| US6192360B1 (en) * | 1998-06-23 | 2001-02-20 | Microsoft Corporation | Methods and apparatus for classifying text and for building a text classifier |
| US7376635B1 (en) * | 2000-07-21 | 2008-05-20 | Ford Global Technologies, Llc | Theme-based system and method for classifying documents |
| US7624006B2 (en) * | 2004-09-15 | 2009-11-24 | Microsoft Corporation | Conditional maximum likelihood estimation of naïve bayes probability models |
| JP2006301972A (ja) | 2005-04-20 | 2006-11-02 | Mihatenu Yume:Kk | 電子秘書装置 |
| US7818176B2 (en) | 2007-02-06 | 2010-10-19 | Voicebox Technologies, Inc. | System and method for selecting and presenting advertisements based on natural language processing of voice-based input |
| US8082151B2 (en) * | 2007-09-18 | 2011-12-20 | At&T Intellectual Property I, Lp | System and method of generating responses to text-based messages |
| CN101516071B (zh) * | 2008-02-18 | 2013-01-23 | 中国移动通信集团重庆有限公司 | 垃圾短消息的分类方法 |
| US20100161406A1 (en) * | 2008-12-23 | 2010-06-24 | Motorola, Inc. | Method and Apparatus for Managing Classes and Keywords and for Retrieving Advertisements |
| JP5346841B2 (ja) * | 2010-02-22 | 2013-11-20 | 株式会社野村総合研究所 | 文書分類システムおよび文書分類プログラムならびに文書分類方法 |
| US8892488B2 (en) * | 2011-06-01 | 2014-11-18 | Nec Laboratories America, Inc. | Document classification with weighted supervised n-gram embedding |
| RU2491622C1 (ru) * | 2012-01-25 | 2013-08-27 | Общество С Ограниченной Ответственностью "Центр Инноваций Натальи Касперской" | Способ классификации документов по категориям |
| CN103246686A (zh) * | 2012-02-14 | 2013-08-14 | 阿里巴巴集团控股有限公司 | 文本分类方法和装置及文本分类的特征处理方法和装置 |
| US9910909B2 (en) * | 2013-01-23 | 2018-03-06 | 24/7 Customer, Inc. | Method and apparatus for extracting journey of life attributes of a user from user interactions |
| CN103336766B (zh) * | 2013-07-04 | 2016-12-28 | 微梦创科网络科技(中国)有限公司 | 短文本垃圾识别以及建模方法和装置 |
| CN103501487A (zh) * | 2013-09-18 | 2014-01-08 | 小米科技有限责任公司 | 分类器更新方法、装置、终端、服务器及系统 |
| CN103500195B (zh) * | 2013-09-18 | 2016-08-17 | 小米科技有限责任公司 | 分类器更新方法、装置、系统及设备 |
| CN103885934B (zh) * | 2014-02-19 | 2017-05-03 | 中国专利信息中心 | 一种专利文献关键短语自动提取方法 |
| US10394953B2 (en) * | 2015-07-17 | 2019-08-27 | Facebook, Inc. | Meme detection in digital chatter analysis |
| CN105117384A (zh) * | 2015-08-19 | 2015-12-02 | 小米科技有限责任公司 | 分类器训练方法、类型识别方法及装置 |
-
2015
- 2015-08-19 CN CN201510511468.1A patent/CN105117384A/zh active Pending
- 2015-12-16 MX MX2016003981A patent/MX2016003981A/es unknown
- 2015-12-16 KR KR1020167003870A patent/KR101778784B1/ko active Active
- 2015-12-16 JP JP2017534873A patent/JP2017535007A/ja active Pending
- 2015-12-16 WO PCT/CN2015/097615 patent/WO2017028416A1/zh not_active Ceased
- 2015-12-16 RU RU2016111677A patent/RU2643500C2/ru active
-
2016
- 2016-07-27 US US15/221,248 patent/US20170052947A1/en not_active Abandoned
- 2016-07-29 EP EP16182001.4A patent/EP3133532A1/en not_active Withdrawn
Also Published As
| Publication number | Publication date |
|---|---|
| CN105117384A (zh) | 2015-12-02 |
| RU2643500C2 (ru) | 2018-02-01 |
| KR101778784B1 (ko) | 2017-09-26 |
| WO2017028416A1 (zh) | 2017-02-23 |
| EP3133532A1 (en) | 2017-02-22 |
| US20170052947A1 (en) | 2017-02-23 |
| RU2016111677A (ru) | 2017-10-04 |
| KR20170032880A (ko) | 2017-03-23 |
| JP2017535007A (ja) | 2017-11-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| MX2016003981A (es) | Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo. | |
| PH12018501058A1 (en) | Order clustering and malicious information combating method and apparatus | |
| EP3926526A3 (en) | Optical character recognition method and apparatus, electronic device and storage medium | |
| EP4075395A3 (en) | Method and apparatus of training anti-spoofing model, method and apparatus of performing anti-spoofing, and device | |
| GB2549875A (en) | Automated content classification/filtering | |
| MX2016003769A (es) | Metodo y dispositivo para extraccion de region. | |
| GB2575611A (en) | Systems and methods for model-assisted cohort selection | |
| WO2019133928A8 (en) | Hierarchical, parallel models for extracting in real time high-value information from data streams and system and method for creation of same | |
| US20170091318A1 (en) | Apparatus and method for extracting keywords from a single document | |
| MX2016002854A (es) | Recuperacion de imagenes por contenido. | |
| CN108959242A (zh) | 一种基于中文字符词性特征的目标实体识别方法及装置 | |
| JP2016508264A5 (es) | ||
| MX2017008583A (es) | Discriminacion de expresiones ambiguas para mejorar la experiencia del usuario. | |
| MX365897B (es) | Método y aparato para determinar similitud y terminal. | |
| CN104680178B (zh) | 基于迁移学习多吸引子细胞自动机的图像分类方法 | |
| CN105117740B (zh) | 字体识别方法及装置 | |
| CN105139041A (zh) | 基于图像的语种识别方法及装置 | |
| GB2523973A (en) | Audio analysis system and method using audio segment characterisation | |
| CN104915420B (zh) | 知识库数据处理方法及系统 | |
| MY194297A (en) | A method and device for providing search engine label | |
| EP4152280A3 (en) | Method and apparatus for recognizing text, and method and apparatus for training text recognition model | |
| SG11201806345QA (en) | Image processing method and device | |
| WO2020044098A3 (zh) | 一种信息流中的排序方法、装置和设备/终端/服务器 | |
| WO2017188606A3 (ko) | 부가 정보를 제공하는 단말 장치 및 제공 방법 | |
| SG11201903685PA (en) | Method and apparatus for barcode identification |