Bio_ClinicalBERT is a domain-specific language model tailored for clinical natural language processing (NLP), extending BioBERT with additional training on clinical notes. It was initialized from BioBERT-Base v1.0 and further pre-trained on all clinical notes from the MIMIC-III database (~880M words), which includes ICU patient records. The training focused on improving performance in tasks like named entity recognition and natural language inference within the healthcare domain. Notes were processed using rule-based sectioning and tokenized with SciSpacy. Training was done for 150,000 steps using a batch size of 32, max sequence length of 128, and a masked language modeling objective with a 0.15 mask probability. Bio_ClinicalBERT is available through Hugging Face's Transformers library for easy integration. It supports medical AI research and applications involving electronic health record understanding, clinical decision support, and biomedical information extraction.

Features

  • Pre-trained on all MIMIC-III clinical notes (~880M words)
  • Initialized from BioBERT, which was trained on PubMed and PMC data
  • Optimized for clinical NLP tasks like NER and NLI
  • Processes text using medical-specific sentence splitting (SciSpacy)
  • Compatible with Hugging Face Transformers (PyTorch, TensorFlow, JAX)
  • Masked language model with 0.15 masking probability
  • Trained with max sequence length of 128 for real-world clinical note length
  • Licensed under MIT, supporting open and flexible usage

Project Samples

Project Activity

See All Activity >

Categories

AI Models

Follow Bio_ClinicalBERT

Bio_ClinicalBERT Web Site

You Might Also Like
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Bio_ClinicalBERT!

Additional Project Details

Registered

2025-07-02