Uncomment: Tree-sitter Based Comment Removal Tool
A fast, accurate, and extensible comment removal tool that uses tree-sitter for parsing, ensuring 100% accuracy in comment identification. Originally created to clean up AI-generated code with excessive comments, it now supports any language with a tree-sitter grammar through its flexible configuration system.
Features
- 100% Accurate: Uses tree-sitter AST parsing to correctly identify comments
- No False Positives: Never removes comment-like content from strings
- Smart Preservation: Keeps important metadata, TODOs, FIXMEs, and language-specific patterns
- Parallel Processing: Multi-threaded processing for improved performance
- Extensible: Support any language with tree-sitter grammar through configuration
- Dynamic Grammar Loading: Load grammars from Git, local paths, or pre-compiled libraries
- Configuration System: TOML-based configuration for project-specific settings
- Smart Init Command: Automatically generate configuration based on your project
- Fast: Leverages tree-sitter's optimized parsing
- Safe: Dry-run mode to preview changes
- Built-in Benchmarking: Performance analysis and profiling tools
Supported Languages
Built-in Languages
- Python (.py, .pyw, .pyi, .pyx, .pxd)
- JavaScript (.js, .jsx, .mjs, .cjs)
- TypeScript (.ts, .tsx, .mts, .cts, .d.ts, .d.mts, .d.cts)
- Rust (.rs)
- Go (.go)
- Java (.java)
- C (.c, .h)
- C++ (.cpp, .cc, .cxx, .hpp, .hxx)
- Ruby (.rb, .rake, .gemspec)
- YAML (.yml, .yaml)
- HCL/Terraform (.hcl, .tf, .tfvars)
- Makefile (Makefile, .mk)
- Shell/Bash (.sh, .bash, .zsh, .bashrc, .zshrc)
- Haskell (.hs, .lhs)
- JSON with Comments (.jsonc)
Extensible to Any Language
Through the configuration system, you can add support for any language with a tree-sitter grammar, including:
- Vue, Svelte, Astro (Web frameworks)
- Swift, Kotlin, Dart (Mobile development)
- Zig, Nim (Systems programming)
- Elixir, Clojure, Julia (Functional/Scientific)
- And many more...
Installation
Via Package Managers
Cargo (Rust)
npm (Node.js)
pip (Python)
From source
Requirements
- For building from source: Rust 1.70+
- For npm/pip packages: Pre-built binaries are downloaded automatically
Quick Start
# Generate a configuration file for your project
# Remove comments from files
# Preview changes without modifying files
Usage
Configuration
# Generate a smart configuration based on your project
# Generate a comprehensive configuration with all supported languages
# Interactive configuration setup
# Use a custom configuration file
Init Command Examples
The init command intelligently detects languages in your project:
# Smart detection - analyzes your project and includes only detected languages
)
)
# Comprehensive mode - includes configurations for 25+ languages
# Specify output location
# Force overwrite existing configuration
Basic Usage
# Remove comments from a single file
# Preview changes without modifying files
# Process multiple files
# Remove documentation comments/docstrings
# Remove TODO and FIXME comments
# Add custom patterns to preserve
# Process entire directory recursively
# Use parallel processing with 8 threads
# Benchmark performance on a large codebase
# Profile performance with detailed analysis
Default Preservation Rules
Always Preserved
- Comments containing
~keep - TODO comments (unless
--remove-todo) - FIXME comments (unless
--remove-fixme) - Documentation comments (unless
--remove-doc)
Language-Specific Preservation
Python:
- Type hints:
# type:,# mypy: - Linting:
# noqa,# pylint:,# flake8:,# ruff: - Formatting:
# fmt:,# isort: - Other:
# pragma:,# NOTE:
JavaScript/TypeScript:
- Type checking:
@flow,@ts-ignore,@ts-nocheck - Linting:
eslint-disable,eslint-enable,biome-ignore - Formatting:
prettier-ignore - Coverage:
v8 ignore,c8 ignore,istanbul ignore - Other:
@jsx,@license,@preserve
Rust:
- Attributes and directives (preserved in comment form)
- Doc comments
///and//!(unless--remove-doc) - Clippy directives:
clippy::
Haskell:
- Comments:
-- - Haddock:
-- |,{-^ ... -},{-| ... -}(unless--remove-doc)
YAML/HCL/Makefile:
- Standard comment removal while preserving file structure
- Supports both
#and//style comments in HCL/Terraform
Configuration
Uncomment uses a flexible TOML-based configuration system that allows you to customize behavior for your project.
Configuration File Discovery
Uncomment searches for configuration files in the following order:
- Command-line specified config:
--config path/to/config.toml .uncommentrc.tomlin the current directory.uncommentrc.tomlin parent directories (up to git root or filesystem root)~/.config/uncomment/config.toml(global configuration)- Built-in defaults
Basic Configuration Example
[]
= false
= false
= false
= ["IMPORTANT", "NOTE", "WARNING"]
= true
= true
[]
= ["py", "pyw", "pyi"]
= ["noqa", "type:", "pragma:", "pylint:"]
[]
# Keep all comments in test files
= false
= false
= false
Dynamic Grammar Loading
You can extend support to any language with a tree-sitter grammar:
# Add Swift support via Git
[]
= "Swift"
= ["swift"]
= ["comment", "multiline_comment"]
= ["MARK:", "TODO:", "FIXME:", "swiftlint:"]
[]
= { = "git", = "https://github.com/alex-pinkus/tree-sitter-swift", = "main" }
# Use a local grammar
[]
= "Custom Language"
= ["custom"]
= ["comment"]
[]
= { = "local", = "/path/to/tree-sitter-custom" }
# Use a pre-compiled library
[]
= "Proprietary Language"
= ["prop"]
= ["comment"]
[]
= { = "library", = "/usr/local/lib/libtree-sitter-proprietary.so" }
Configuration Merging
When multiple configuration files are found, they are merged with the following precedence (highest to lowest):
- Command-line flags
- Local
.uncommentrc.tomlfiles (closer to the file being processed wins) - Global configuration (
~/.config/uncomment/config.toml) - Built-in defaults
Pattern-specific configurations override language configurations for matching files.
How It Works
Unlike regex-based tools, uncomment uses tree-sitter to build a proper Abstract Syntax Tree (AST) of your code. This means it understands the difference between:
- Real comments vs comment-like content in strings
- Documentation comments vs regular comments
- Inline comments vs standalone comments
- Language-specific metadata that should be preserved
Architecture
The tool is built with a modular, extensible architecture:
- Language Registry: Manages both built-in and dynamically loaded languages
- Grammar Manager: Handles loading grammars from Git, local paths, or compiled libraries
- Configuration System: TOML-based hierarchical configuration with merging
- AST Visitor: Traverses the tree-sitter AST to find comments
- Preservation Engine: Applies rules to determine what to keep
- Output Generator: Produces clean code with comments removed
Key Components
- Dynamic Grammar Loading: Automatically downloads and compiles tree-sitter grammars
- Grammar Caching: Caches compiled grammars for performance
- Configuration Discovery: Searches for configs in project hierarchy
- Pattern Matching: File-pattern-specific configuration overrides
Adding New Languages
With the new configuration system, you can add languages without modifying code:
Method 1: Using Configuration (Recommended)
Add to your .uncommentrc.toml:
[]
= "My Language"
= ["ml", "mli"]
= ["comment"]
= ["TODO", "FIXME"]
[]
= { = "git", = "https://github.com/tree-sitter/tree-sitter-mylang", = "main" }
Method 2: Built-in Support
For frequently used languages:
- Add the tree-sitter parser dependency to
Cargo.toml - Register the language in
src/grammar/mod.rs - Add language configuration in
src/languages/registry.rs
Git Hooks
Pre-commit
Add to your .pre-commit-config.yaml:
repos:
- repo: https://github.com/Goldziher/uncomment
rev: v2.4.0
hooks:
- id: uncomment
Lefthook
Add to your lefthook.yml:
pre-commit:
commands:
uncomment:
run: uncomment {staged_files}
stage_fixed: true
For both hooks, install uncomment via pip:
Performance
While slightly slower than regex-based approaches due to parsing overhead, the tool is very fast and scales well with parallel processing:
Single-threaded Performance
- Small files (<1000 lines): ~20-30ms
- Large files (>10000 lines): ~100-200ms
Parallel Processing Benchmarks
Performance scales excellently with multiple threads:
| Thread Count | Files/Second | Speedup |
|---|---|---|
| 1 thread | 1,500 | 1.0x |
| 4 threads | 3,900 | 2.6x |
| 8 threads | 5,100 | 3.4x |
Benchmarks run on a large enterprise codebase with 5,000 mixed language files
Built-in Benchmarking
Use the built-in tools to measure performance on your specific codebase:
# Basic benchmark
# Detailed benchmark with multiple iterations
# Memory and performance profiling
The accuracy gained through AST parsing is worth the small performance cost, and parallel processing makes it suitable for even the largest codebases.
License
MIT