[go: up one dir, main page]

MX2016003981A - Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo. - Google Patents

Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo.

Info

Publication number
MX2016003981A
MX2016003981A MX2016003981A MX2016003981A MX2016003981A MX 2016003981 A MX2016003981 A MX 2016003981A MX 2016003981 A MX2016003981 A MX 2016003981A MX 2016003981 A MX2016003981 A MX 2016003981A MX 2016003981 A MX2016003981 A MX 2016003981A
Authority
MX
Mexico
Prior art keywords
sample
classifier
term
training
characteristic
Prior art date
Application number
MX2016003981A
Other languages
English (en)
Inventor
Zhang Tao
Wang Pingze
Long Fei
Original Assignee
Xiaomi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaomi Inc filed Critical Xiaomi Inc
Publication of MX2016003981A publication Critical patent/MX2016003981A/es

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

En la descripción se proveen método y dispositivo para capacitar clasificador y reconocimiento de tipo, que pertenecen al campo de proceso de lenguaje natural. El método para capacitar el clasificador incluye extraer una cláusula de muestra que incluye una palabra clave objetivo de la información de muestra; que lleva a cabo marcación binaria en la cláusula de muestra para obtener el conjunto de capacitación de muestra con base en si cada cláusula de la muestra pertenece a la clase objetivo; que lleva a cabo la segmentación de palabras en cada cláusula de muestra en el conjunto de capacitación de muestras para obtener una pluralidad de palabras; que extrae un conjunto de características específico de la pluralidad de palabras, el conjunto de características específico que incluye por lo menos una de las palabras características; que construye un clasificador con base en las palabras características en el conjunto de características específico; capacitación del clasificador con base en resultados de la marcación binaria en el conjunto de capacitación de muestras. Dado que las palabras características en el conjunto de características específico se extrajeron al llevar a cabo la segmentación de palabras en la cláusula de muestras que incluye la palabra clave objetivo, el clasificador puede predecir con precisión la cláusulas que incluye la palabra clave objetivo, y por lo tanto logra los resultados de reconocimiento precisos. Figura 1.
MX2016003981A 2015-08-19 2015-12-16 Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo. MX2016003981A (es)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510511468.1A CN105117384A (zh) 2015-08-19 2015-08-19 分类器训练方法、类型识别方法及装置
PCT/CN2015/097615 WO2017028416A1 (zh) 2015-08-19 2015-12-16 分类器训练方法、类型识别方法及装置

Publications (1)

Publication Number Publication Date
MX2016003981A true MX2016003981A (es) 2017-04-27

Family

ID=54665378

Family Applications (1)

Application Number Title Priority Date Filing Date
MX2016003981A MX2016003981A (es) 2015-08-19 2015-12-16 Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo.

Country Status (8)

Country Link
US (1) US20170052947A1 (es)
EP (1) EP3133532A1 (es)
JP (1) JP2017535007A (es)
KR (1) KR101778784B1 (es)
CN (1) CN105117384A (es)
MX (1) MX2016003981A (es)
RU (1) RU2643500C2 (es)
WO (1) WO2017028416A1 (es)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117384A (zh) * 2015-08-19 2015-12-02 小米科技有限责任公司 分类器训练方法、类型识别方法及装置
CN106060000B (zh) * 2016-05-06 2020-02-07 青岛海信移动通信技术股份有限公司 一种识别验证信息的方法和设备
CN106211165B (zh) * 2016-06-14 2020-04-21 北京奇虎科技有限公司 检测外文骚扰短信的方法、装置及相应的客户端
CN107135494B (zh) * 2017-04-24 2020-06-19 北京小米移动软件有限公司 垃圾短信识别方法及装置
CN110444199B (zh) * 2017-05-27 2022-01-07 腾讯科技(深圳)有限公司 一种语音关键词识别方法、装置、终端及服务器
CN110019782B (zh) * 2017-09-26 2021-11-02 北京京东尚科信息技术有限公司 用于输出文本类别的方法和装置
CN107704892B (zh) * 2017-11-07 2019-05-17 宁波爱信诺航天信息有限公司 一种基于贝叶斯模型的商品编码分类方法以及系统
US10726204B2 (en) 2018-05-24 2020-07-28 International Business Machines Corporation Training data expansion for natural language classification
CN109325123B (zh) * 2018-09-29 2020-10-16 武汉斗鱼网络科技有限公司 基于补集特征的贝叶斯文档分类方法、装置、设备及介质
US11100287B2 (en) * 2018-10-30 2021-08-24 International Business Machines Corporation Classification engine for learning properties of words and multi-word expressions
CN109979440B (zh) * 2019-03-13 2021-05-11 广州市网星信息技术有限公司 关键词样本确定方法、语音识别方法、装置、设备和介质
CN109992771B (zh) * 2019-03-13 2020-05-05 北京三快在线科技有限公司 一种文本生成的方法及装置
CN110083835A (zh) * 2019-04-24 2019-08-02 北京邮电大学 一种基于图和词句协同的关键词提取方法及装置
CN111339297B (zh) * 2020-02-21 2023-04-25 广州天懋信息系统股份有限公司 网络资产异常检测方法、系统、介质和设备
CN113688436A (zh) * 2020-05-19 2021-11-23 天津大学 一种pca与朴素贝叶斯分类融合的硬件木马检测方法
CN112529623B (zh) * 2020-12-14 2023-07-11 中国联合网络通信集团有限公司 恶意用户的识别方法、装置和设备
CN112925958A (zh) * 2021-02-05 2021-06-08 深圳力维智联技术有限公司 多源异构数据适配方法、装置、设备及可读存储介质
CN114969239A (zh) * 2021-02-27 2022-08-30 北京紫冬认知科技有限公司 病例数据的处理方法、装置、电子设备及存储介质
CN114281983B (zh) * 2021-04-05 2024-04-12 北京智慧星光信息技术有限公司 分层结构的文本分类方法、系统、电子设备和存储介质
CN113570269B (zh) * 2021-08-03 2024-10-18 工银科技有限公司 运维项目的管理方法、装置、设备、介质和程序产品
CN113705818B (zh) * 2021-08-31 2024-04-19 支付宝(杭州)信息技术有限公司 对支付指标波动进行归因的方法及装置
CN114706991B (zh) * 2022-01-27 2025-08-05 清华大学 一种知识网络构建方法、装置、设备及存储介质
CN116094886B (zh) * 2023-03-09 2023-08-25 浙江万胜智能科技股份有限公司 一种双模模块中载波通信数据处理方法及系统
CN116467604A (zh) * 2023-04-27 2023-07-21 中国工商银行股份有限公司 对话状态识别方法、装置、计算机设备和存储介质
CN116894216A (zh) * 2023-07-19 2023-10-17 中国工商银行股份有限公司 服务器硬件告警类别的确定方法、装置及电子设备
CN117910875B (zh) * 2024-01-22 2024-07-19 青海省科技发展服务中心 一种披碱草属资源抗逆性评价系统

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11203318A (ja) * 1998-01-19 1999-07-30 Seiko Epson Corp 文書分類方法および装置並びに文書分類処理プログラムを記録した記録媒体
US6192360B1 (en) * 1998-06-23 2001-02-20 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier
US7376635B1 (en) * 2000-07-21 2008-05-20 Ford Global Technologies, Llc Theme-based system and method for classifying documents
US7624006B2 (en) * 2004-09-15 2009-11-24 Microsoft Corporation Conditional maximum likelihood estimation of naïve bayes probability models
JP2006301972A (ja) 2005-04-20 2006-11-02 Mihatenu Yume:Kk 電子秘書装置
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US8082151B2 (en) * 2007-09-18 2011-12-20 At&T Intellectual Property I, Lp System and method of generating responses to text-based messages
CN101516071B (zh) * 2008-02-18 2013-01-23 中国移动通信集团重庆有限公司 垃圾短消息的分类方法
US20100161406A1 (en) * 2008-12-23 2010-06-24 Motorola, Inc. Method and Apparatus for Managing Classes and Keywords and for Retrieving Advertisements
JP5346841B2 (ja) * 2010-02-22 2013-11-20 株式会社野村総合研究所 文書分類システムおよび文書分類プログラムならびに文書分類方法
US8892488B2 (en) * 2011-06-01 2014-11-18 Nec Laboratories America, Inc. Document classification with weighted supervised n-gram embedding
RU2491622C1 (ru) * 2012-01-25 2013-08-27 Общество С Ограниченной Ответственностью "Центр Инноваций Натальи Касперской" Способ классификации документов по категориям
CN103246686A (zh) * 2012-02-14 2013-08-14 阿里巴巴集团控股有限公司 文本分类方法和装置及文本分类的特征处理方法和装置
US9910909B2 (en) * 2013-01-23 2018-03-06 24/7 Customer, Inc. Method and apparatus for extracting journey of life attributes of a user from user interactions
CN103336766B (zh) * 2013-07-04 2016-12-28 微梦创科网络科技(中国)有限公司 短文本垃圾识别以及建模方法和装置
CN103501487A (zh) * 2013-09-18 2014-01-08 小米科技有限责任公司 分类器更新方法、装置、终端、服务器及系统
CN103500195B (zh) * 2013-09-18 2016-08-17 小米科技有限责任公司 分类器更新方法、装置、系统及设备
CN103885934B (zh) * 2014-02-19 2017-05-03 中国专利信息中心 一种专利文献关键短语自动提取方法
US10394953B2 (en) * 2015-07-17 2019-08-27 Facebook, Inc. Meme detection in digital chatter analysis
CN105117384A (zh) * 2015-08-19 2015-12-02 小米科技有限责任公司 分类器训练方法、类型识别方法及装置

Also Published As

Publication number Publication date
CN105117384A (zh) 2015-12-02
RU2643500C2 (ru) 2018-02-01
KR101778784B1 (ko) 2017-09-26
WO2017028416A1 (zh) 2017-02-23
EP3133532A1 (en) 2017-02-22
US20170052947A1 (en) 2017-02-23
RU2016111677A (ru) 2017-10-04
KR20170032880A (ko) 2017-03-23
JP2017535007A (ja) 2017-11-24

Similar Documents

Publication Publication Date Title
MX2016003981A (es) Metodo y dispositivo para capacitar un clasificador, reconocimiento de tipo.
PH12018501058A1 (en) Order clustering and malicious information combating method and apparatus
EP3926526A3 (en) Optical character recognition method and apparatus, electronic device and storage medium
EP4075395A3 (en) Method and apparatus of training anti-spoofing model, method and apparatus of performing anti-spoofing, and device
GB2549875A (en) Automated content classification/filtering
MX2016003769A (es) Metodo y dispositivo para extraccion de region.
GB2575611A (en) Systems and methods for model-assisted cohort selection
WO2019133928A8 (en) Hierarchical, parallel models for extracting in real time high-value information from data streams and system and method for creation of same
US20170091318A1 (en) Apparatus and method for extracting keywords from a single document
MX2016002854A (es) Recuperacion de imagenes por contenido.
CN108959242A (zh) 一种基于中文字符词性特征的目标实体识别方法及装置
JP2016508264A5 (es)
MX2017008583A (es) Discriminacion de expresiones ambiguas para mejorar la experiencia del usuario.
MX365897B (es) Método y aparato para determinar similitud y terminal.
CN104680178B (zh) 基于迁移学习多吸引子细胞自动机的图像分类方法
CN105117740B (zh) 字体识别方法及装置
CN105139041A (zh) 基于图像的语种识别方法及装置
GB2523973A (en) Audio analysis system and method using audio segment characterisation
CN104915420B (zh) 知识库数据处理方法及系统
MY194297A (en) A method and device for providing search engine label
EP4152280A3 (en) Method and apparatus for recognizing text, and method and apparatus for training text recognition model
SG11201806345QA (en) Image processing method and device
WO2020044098A3 (zh) 一种信息流中的排序方法、装置和设备/终端/服务器
WO2017188606A3 (ko) 부가 정보를 제공하는 단말 장치 및 제공 방법
SG11201903685PA (en) Method and apparatus for barcode identification