Uncomment: Tree-sitter Based Comment Removal Tool

A fast, accurate, and extensible comment removal tool that uses tree-sitter for parsing, ensuring 100% accuracy in comment identification. Originally created to clean up AI-generated code with excessive comments, it now supports any language with a tree-sitter grammar through its flexible configuration system.

Features

100% Accurate: Uses tree-sitter AST parsing to correctly identify comments
No False Positives: Never removes comment-like content from strings
Smart Preservation: Keeps important metadata, TODOs, FIXMEs, and language-specific patterns
Parallel Processing: Multi-threaded processing for improved performance
Extensible: Support any language with tree-sitter grammar through configuration
Dynamic Grammar Loading: Load grammars from Git, local paths, or pre-compiled libraries
Configuration System: TOML-based configuration for project-specific settings
Smart Init Command: Automatically generate configuration based on your project
Fast: Leverages tree-sitter's optimized parsing
Safe: Dry-run mode to preview changes
Built-in Benchmarking: Performance analysis and profiling tools

Supported Languages

Built-in Languages

Python (.py, .pyw, .pyi, .pyx, .pxd)
JavaScript (.js, .jsx, .mjs, .cjs)
TypeScript (.ts, .tsx, .mts, .cts, .d.ts, .d.mts, .d.cts)
Rust (.rs)
Go (.go)
Java (.java)
C (.c, .h)
C++ (.cpp, .cc, .cxx, .hpp, .hxx)
Ruby (.rb, .rake, .gemspec)
YAML (.yml, .yaml)
HCL/Terraform (.hcl, .tf, .tfvars)
Makefile (Makefile, .mk)
Shell/Bash (.sh, .bash, .zsh, .bashrc, .zshrc)
Haskell (.hs, .lhs)
JSON with Comments (.jsonc)

Extensible to Any Language

Through the configuration system, you can add support for any language with a tree-sitter grammar, including:

Vue, Svelte, Astro (Web frameworks)
Swift, Kotlin, Dart (Mobile development)
Zig, Nim (Systems programming)
Elixir, Clojure, Julia (Functional/Scientific)
And many more...

Installation

Via Package Managers

Cargo (Rust)

cargo install uncomment

npm (Node.js)

npm install -g uncomment-cli

pip (Python)

pip install uncomment

From source

git clone https://github.com/Goldziher/uncomment.git
cd uncomment
cargo install --path .

Requirements

For building from source: Rust 1.70+
For npm/pip packages: Pre-built binaries are downloaded automatically

Quick Start

# Generate a configuration file for your project
uncomment init

# Remove comments from files
uncomment src/

# Preview changes without modifying files
uncomment --dry-run src/

Usage

Configuration

# Generate a smart configuration based on your project
uncomment init

# Generate a comprehensive configuration with all supported languages
uncomment init --comprehensive

# Interactive configuration setup
uncomment init --interactive

# Use a custom configuration file
uncomment --config my-config.toml src/

Init Command Examples

The init command intelligently detects languages in your project:

# Smart detection - analyzes your project and includes only detected languages
$ uncomment init
Detected languages in your project:
- 150 rust files
- 89 typescript files
- 45 python files
- 12 vue files (requires custom grammar)
- 8 dockerfile files (requires custom grammar)

Generated .uncommentrc.toml with configurations for detected languages.

# Comprehensive mode - includes configurations for 25+ languages
$ uncomment init --comprehensive
Generated comprehensive configuration with all supported languages.

# Specify output location
$ uncomment init --output config/uncomment.toml

# Force overwrite existing configuration
$ uncomment init --force

Basic Usage

# Remove comments from a single file
uncomment file.py

# Preview changes without modifying files
uncomment --dry-run file.py

# Process multiple files
uncomment src/*.py

# Remove documentation comments/docstrings
uncomment --remove-doc file.py

# Remove TODO and FIXME comments
uncomment --remove-todo --remove-fixme file.py

# Add custom patterns to preserve
uncomment --ignore-patterns "HACK" --ignore-patterns "WARNING" file.py

# Process entire directory recursively
uncomment src/

# Use parallel processing with 8 threads
uncomment --threads 8 src/

# Benchmark performance on a large codebase
uncomment benchmark --target /path/to/repo --iterations 3

# Profile performance with detailed analysis
uncomment profile /path/to/repo

Default Preservation Rules

Always Preserved

Comments containing ~keep
TODO comments (unless --remove-todo)
FIXME comments (unless --remove-fixme)
Documentation comments (unless --remove-doc)

Language-Specific Preservation

Python:

Type hints: # type:, # mypy:
Linting: # noqa, # pylint:, # flake8:, # ruff:
Formatting: # fmt:, # isort:
Other: # pragma:, # NOTE:

JavaScript/TypeScript:

Type checking: @flow, @ts-ignore, @ts-nocheck
Linting: eslint-disable, eslint-enable, biome-ignore
Formatting: prettier-ignore
Coverage: v8 ignore, c8 ignore, istanbul ignore
Other: @jsx, @license, @preserve

Rust:

Attributes and directives (preserved in comment form)
Doc comments /// and //! (unless --remove-doc)
Clippy directives: clippy::

Haskell:

Comments: --
Haddock: -- |, {-^ ... -}, {-| ... -} (unless --remove-doc)

YAML/HCL/Makefile:

Standard comment removal while preserving file structure
Supports both # and // style comments in HCL/Terraform

Configuration

Uncomment uses a flexible TOML-based configuration system that allows you to customize behavior for your project.

Configuration File Discovery

Uncomment searches for configuration files in the following order:

Command-line specified config: --config path/to/config.toml
.uncommentrc.toml in the current directory
.uncommentrc.toml in parent directories (up to git root or filesystem root)
~/.config/uncomment/config.toml (global configuration)
Built-in defaults

Basic Configuration Example

[global]
remove_todos = false
remove_fixme = false
remove_docs = false
preserve_patterns = ["IMPORTANT", "NOTE", "WARNING"]
use_default_ignores = true
respect_gitignore = true

[languages.python]
extensions = ["py", "pyw", "pyi"]
preserve_patterns = ["noqa", "type:", "pragma:", "pylint:"]

[patterns."tests/**/*.py"]
# Keep all comments in test files
remove_todos = false
remove_fixme = false
remove_docs = false

Dynamic Grammar Loading

You can extend support to any language with a tree-sitter grammar:

# Add Swift support via Git
[languages.swift]
name = "Swift"
extensions = ["swift"]
comment_nodes = ["comment", "multiline_comment"]
preserve_patterns = ["MARK:", "TODO:", "FIXME:", "swiftlint:"]

[languages.swift.grammar]
source = { type = "git", url = "https://github.com/alex-pinkus/tree-sitter-swift", branch = "main" }

# Use a local grammar
[languages.custom]
name = "Custom Language"
extensions = ["custom"]
comment_nodes = ["comment"]

[languages.custom.grammar]
source = { type = "local", path = "/path/to/tree-sitter-custom" }

# Use a pre-compiled library
[languages.proprietary]
name = "Proprietary Language"
extensions = ["prop"]
comment_nodes = ["comment"]

[languages.proprietary.grammar]
source = { type = "library", path = "/usr/local/lib/libtree-sitter-proprietary.so" }

Configuration Merging

When multiple configuration files are found, they are merged with the following precedence (highest to lowest):

Command-line flags
Local .uncommentrc.toml files (closer to the file being processed wins)
Global configuration (~/.config/uncomment/config.toml)
Built-in defaults

Pattern-specific configurations override language configurations for matching files.

How It Works

Unlike regex-based tools, uncomment uses tree-sitter to build a proper Abstract Syntax Tree (AST) of your code. This means it understands the difference between:

Real comments vs comment-like content in strings
Documentation comments vs regular comments
Inline comments vs standalone comments
Language-specific metadata that should be preserved

Architecture

The tool is built with a modular, extensible architecture:

Language Registry: Manages both built-in and dynamically loaded languages
Grammar Manager: Handles loading grammars from Git, local paths, or compiled libraries
Configuration System: TOML-based hierarchical configuration with merging
AST Visitor: Traverses the tree-sitter AST to find comments
Preservation Engine: Applies rules to determine what to keep
Output Generator: Produces clean code with comments removed

Key Components

Dynamic Grammar Loading: Automatically downloads and compiles tree-sitter grammars
Grammar Caching: Caches compiled grammars for performance
Configuration Discovery: Searches for configs in project hierarchy
Pattern Matching: File-pattern-specific configuration overrides

Adding New Languages

With the new configuration system, you can add languages without modifying code:

Method 1: Using Configuration (Recommended)

Add to your .uncommentrc.toml:

[languages.mylang]
name = "My Language"
extensions = ["ml", "mli"]
comment_nodes = ["comment"]
preserve_patterns = ["TODO", "FIXME"]

[languages.mylang.grammar]
source = { type = "git", url = "https://github.com/tree-sitter/tree-sitter-mylang", branch = "main" }

Method 2: Built-in Support

For frequently used languages:

Add the tree-sitter parser dependency to Cargo.toml
Register the language in src/grammar/mod.rs
Add language configuration in src/languages/registry.rs

Git Hooks

Pre-commit

Add to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/Goldziher/uncomment
    rev: v2.4.0
    hooks:
      - id: uncomment

Lefthook

Add to your lefthook.yml:

pre-commit:
  commands:
    uncomment:
      run: uncomment {staged_files}
      stage_fixed: true

For both hooks, install uncomment via pip:

pip install uncomment

Performance

While slightly slower than regex-based approaches due to parsing overhead, the tool is very fast and scales well with parallel processing:

Single-threaded Performance

Small files (<1000 lines): ~20-30ms
Large files (>10000 lines): ~100-200ms

Parallel Processing Benchmarks

Performance scales excellently with multiple threads:

Thread Count	Files/Second	Speedup
1 thread	1,500	1.0x
4 threads	3,900	2.6x
8 threads	5,100	3.4x

Benchmarks run on a large enterprise codebase with 5,000 mixed language files

Built-in Benchmarking

Use the built-in tools to measure performance on your specific codebase:

# Basic benchmark
uncomment benchmark --target /path/to/repo

# Detailed benchmark with multiple iterations
uncomment benchmark --target /path/to/repo --iterations 5 --threads 8

# Memory and performance profiling
uncomment profile /path/to/repo

The accuracy gained through AST parsing is worth the small performance cost, and parallel processing makes it suitable for even the largest codebases.

License

MIT

uncomment 2.4.0

Uncomment: Tree-sitter Based Comment Removal Tool

Features

Supported Languages

Built-in Languages

Extensible to Any Language

Installation

Via Package Managers

Cargo (Rust)

npm (Node.js)

pip (Python)

From source

Requirements

Quick Start

Usage

Configuration

Init Command Examples

Basic Usage

Default Preservation Rules

Always Preserved

Language-Specific Preservation

Configuration

Configuration File Discovery

Basic Configuration Example

Dynamic Grammar Loading

Configuration Merging

How It Works

Architecture

Key Components

Adding New Languages

Method 1: Using Configuration (Recommended)

Method 2: Built-in Support

Git Hooks

Pre-commit

Lefthook

Performance

Single-threaded Performance

Parallel Processing Benchmarks

Built-in Benchmarking

License