[go: up one dir, main page]

uncomment 2.5.0

A CLI tool to remove comments from code using tree-sitter for accurate parsing
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
# Uncomment: Tree-sitter Based Comment Removal Tool

A fast, accurate, and extensible comment removal tool that uses tree-sitter for parsing, ensuring 100% accuracy in comment identification. Originally created to clean up AI-generated code with excessive comments, it now supports any language with a tree-sitter grammar through its flexible configuration system.

## Features

- **100% Accurate**: Uses tree-sitter AST parsing to correctly identify comments
- **No False Positives**: Never removes comment-like content from strings
- **Smart Preservation**: Keeps important metadata, TODOs, FIXMEs, and language-specific patterns
- **Parallel Processing**: Multi-threaded processing for improved performance
- **Extensible**: Support any language with tree-sitter grammar through configuration
- **Dynamic Grammar Loading**: Load grammars from Git, local paths, or pre-compiled libraries
- **Configuration System**: TOML-based configuration for project-specific settings
- **Smart Init Command**: Automatically generate configuration based on your project
- **Fast**: Leverages tree-sitter's optimized parsing
- **Safe**: Dry-run mode to preview changes
- **Built-in Benchmarking**: Performance analysis and profiling tools

## Supported Languages

### Built-in Languages

- Python (.py, .pyw, .pyi, .pyx, .pxd)
- JavaScript (.js, .jsx, .mjs, .cjs)
- TypeScript (.ts, .tsx, .mts, .cts, .d.ts, .d.mts, .d.cts)
- Rust (.rs)
- Go (.go)
- Java (.java)
- C (.c, .h)
- C++ (.cpp, .cc, .cxx, .hpp, .hxx)
- Ruby (.rb, .rake, .gemspec)
- YAML (.yml, .yaml)
- HCL/Terraform (.hcl, .tf, .tfvars)
- Makefile (Makefile, .mk)
- Shell/Bash (.sh, .bash, .zsh, .bashrc, .zshrc)
- Haskell (.hs, .lhs)
- JSON with Comments (.jsonc)

### Extensible to Any Language

Through the configuration system, you can add support for any language with a tree-sitter grammar, including:

- Vue, Svelte, Astro (Web frameworks)
- Swift, Kotlin, Dart (Mobile development)
- Zig, Nim (Systems programming)
- Elixir, Clojure, Julia (Functional/Scientific)
- And many more...

## Installation

### Via Package Managers

#### Homebrew (macOS/Linux)

```bash
brew tap goldziher/tap
brew install uncomment
```

#### Cargo (Rust)

```bash
cargo install uncomment
```

#### npm (Node.js)

```bash
npm install -g uncomment-cli
```

#### pip (Python)

```bash
pip install uncomment
```

### From source

```bash
git clone https://github.com/Goldziher/uncomment.git
cd uncomment
cargo install --path .
```

### Requirements

- For building from source: Rust 1.70+
- For npm/pip packages: Pre-built binaries are downloaded automatically

## Quick Start

```bash
# Generate a configuration file for your project
uncomment init

# Remove comments from files
uncomment src/

# Preview changes without modifying files
uncomment --dry-run src/
```

## Usage

### Configuration

```bash
# Generate a smart configuration based on your project
uncomment init

# Generate a comprehensive configuration with all supported languages
uncomment init --comprehensive

# Interactive configuration setup
uncomment init --interactive

# Use a custom configuration file
uncomment --config my-config.toml src/
```

#### Init Command Examples

The `init` command intelligently detects languages in your project:

```bash
# Smart detection - analyzes your project and includes only detected languages
$ uncomment init
Detected languages in your project:
- 150 rust files
- 89 typescript files
- 45 python files
- 12 vue files (requires custom grammar)
- 8 dockerfile files (requires custom grammar)

Generated .uncommentrc.toml with configurations for detected languages.

# Comprehensive mode - includes configurations for 25+ languages
$ uncomment init --comprehensive
Generated comprehensive configuration with all supported languages.

# Specify output location
$ uncomment init --output config/uncomment.toml

# Force overwrite existing configuration
$ uncomment init --force
```

### Basic Usage

```bash
# Remove comments from a single file
uncomment file.py

# Preview changes without modifying files
uncomment --dry-run file.py

# Process multiple files
uncomment src/*.py

# Remove documentation comments/docstrings
uncomment --remove-doc file.py

# Remove TODO and FIXME comments
uncomment --remove-todo --remove-fixme file.py

# Add custom patterns to preserve
uncomment --ignore-patterns "HACK" --ignore-patterns "WARNING" file.py

# Process entire directory recursively
uncomment src/

# Use parallel processing with 8 threads
uncomment --threads 8 src/

# Benchmark performance on a large codebase
uncomment benchmark --target /path/to/repo --iterations 3

# Profile performance with detailed analysis
uncomment profile /path/to/repo
```

## Default Preservation Rules

### Always Preserved

- Comments containing `~keep`
- TODO comments (unless `--remove-todo`)
- FIXME comments (unless `--remove-fixme`)
- Documentation comments (unless `--remove-doc`)

### Language-Specific Preservation

**Python:**

- Type hints: `# type:`, `# mypy:`
- Linting: `# noqa`, `# pylint:`, `# flake8:`, `# ruff:`
- Formatting: `# fmt:`, `# isort:`
- Other: `# pragma:`, `# NOTE:`

**JavaScript/TypeScript:**

- Type checking: `@flow`, `@ts-ignore`, `@ts-nocheck`
- Linting: `eslint-disable`, `eslint-enable`, `biome-ignore`
- Formatting: `prettier-ignore`
- Coverage: `v8 ignore`, `c8 ignore`, `istanbul ignore`
- Other: `@jsx`, `@license`, `@preserve`

**Rust:**

- Attributes and directives (preserved in comment form)
- Doc comments `///` and `//!` (unless `--remove-doc`)
- Clippy directives: `clippy::`

**Go:**

- Documentation comments: `//` comments that immediately precede `package`, `func`, `type`, `const`, or `var` declarations (unless `--remove-doc`)
- Regular comments: All other `//` and `/* */` comments are removed
- Build constraints: `//go:build`, `//+build`

**Haskell:**

- Comments: `--`
- Haddock: `-- |`, `{-^ ... -}`, `{-| ... -}` (unless `--remove-doc`)

**YAML/HCL/Makefile:**

- Standard comment removal while preserving file structure
- Supports both `#` and `//` style comments in HCL/Terraform

## Configuration

Uncomment uses a flexible TOML-based configuration system that allows you to customize behavior for your project.

### Configuration File Discovery

Uncomment searches for configuration files in the following order:

1. Command-line specified config: `--config path/to/config.toml`
2. `.uncommentrc.toml` in the current directory
3. `.uncommentrc.toml` in parent directories (up to git root or filesystem root)
4. `~/.config/uncomment/config.toml` (global configuration)
5. Built-in defaults

### Basic Configuration Example

```toml
[global]
remove_todos = false
remove_fixme = false
remove_docs = false
preserve_patterns = ["IMPORTANT", "NOTE", "WARNING"]
use_default_ignores = true
respect_gitignore = true

[languages.python]
extensions = ["py", "pyw", "pyi"]
preserve_patterns = ["noqa", "type:", "pragma:", "pylint:"]

[patterns."tests/**/*.py"]
# Keep all comments in test files
remove_todos = false
remove_fixme = false
remove_docs = false
```

### Dynamic Grammar Loading

You can extend support to any language with a tree-sitter grammar:

```toml
# Add Swift support via Git
[languages.swift]
name = "Swift"
extensions = ["swift"]
comment_nodes = ["comment", "multiline_comment"]
preserve_patterns = ["MARK:", "TODO:", "FIXME:", "swiftlint:"]

[languages.swift.grammar]
source = { type = "git", url = "https://github.com/alex-pinkus/tree-sitter-swift", branch = "main" }

# Use a local grammar
[languages.custom]
name = "Custom Language"
extensions = ["custom"]
comment_nodes = ["comment"]

[languages.custom.grammar]
source = { type = "local", path = "/path/to/tree-sitter-custom" }

# Use a pre-compiled library
[languages.proprietary]
name = "Proprietary Language"
extensions = ["prop"]
comment_nodes = ["comment"]

[languages.proprietary.grammar]
source = { type = "library", path = "/usr/local/lib/libtree-sitter-proprietary.so" }
```

### Configuration Merging

When multiple configuration files are found, they are merged with the following precedence (highest to lowest):

1. Command-line flags
2. Local `.uncommentrc.toml` files (closer to the file being processed wins)
3. Global configuration (`~/.config/uncomment/config.toml`)
4. Built-in defaults

Pattern-specific configurations override language configurations for matching files.

## How It Works

Unlike regex-based tools, uncomment uses tree-sitter to build a proper Abstract Syntax Tree (AST) of your code. This means it understands the difference between:

- Real comments vs comment-like content in strings
- Documentation comments vs regular comments
- Inline comments vs standalone comments
- Language-specific metadata that should be preserved

## Architecture

The tool is built with a modular, extensible architecture:

1. **Language Registry**: Manages both built-in and dynamically loaded languages
2. **Grammar Manager**: Handles loading grammars from Git, local paths, or compiled libraries
3. **Configuration System**: TOML-based hierarchical configuration with merging
4. **AST Visitor**: Traverses the tree-sitter AST to find comments
5. **Preservation Engine**: Applies rules to determine what to keep
6. **Output Generator**: Produces clean code with comments removed

### Key Components

- **Dynamic Grammar Loading**: Automatically downloads and compiles tree-sitter grammars
- **Grammar Caching**: Caches compiled grammars for performance
- **Configuration Discovery**: Searches for configs in project hierarchy
- **Pattern Matching**: File-pattern-specific configuration overrides

## Adding New Languages

With the new configuration system, you can add languages without modifying code:

### Method 1: Using Configuration (Recommended)

Add to your `.uncommentrc.toml`:

```toml
[languages.mylang]
name = "My Language"
extensions = ["ml", "mli"]
comment_nodes = ["comment"]
preserve_patterns = ["TODO", "FIXME"]

[languages.mylang.grammar]
source = { type = "git", url = "https://github.com/tree-sitter/tree-sitter-mylang", branch = "main" }
```

### Method 2: Built-in Support

For frequently used languages:

1. Add the tree-sitter parser dependency to `Cargo.toml`
2. Register the language in `src/grammar/mod.rs`
3. Add language configuration in `src/languages/registry.rs`

## Git Hooks

### Pre-commit

Add to your `.pre-commit-config.yaml`:

```yaml
repos:
  - repo: https://github.com/Goldziher/uncomment
    rev: v2.5.0
    hooks:
      - id: uncomment
```

### Lefthook

Add to your `lefthook.yml`:

```yaml
pre-commit:
  commands:
    uncomment:
      run: uncomment {staged_files}
      stage_fixed: true
```

For both hooks, install uncomment via pip:

```bash
pip install uncomment
```

## Performance

While slightly slower than regex-based approaches due to parsing overhead, the tool is very fast and scales well with parallel processing:

### Single-threaded Performance

- Small files (<1000 lines): ~20-30ms
- Large files (>10000 lines): ~100-200ms

### Parallel Processing Benchmarks

Performance scales excellently with multiple threads:

| Thread Count | Files/Second | Speedup |
| ------------ | ------------ | ------- |
| 1 thread     | 1,500        | 1.0x    |
| 4 threads    | 3,900        | 2.6x    |
| 8 threads    | 5,100        | 3.4x    |

_Benchmarks run on a large enterprise codebase with 5,000 mixed language files_

### Built-in Benchmarking

Use the built-in tools to measure performance on your specific codebase:

```bash
# Basic benchmark
uncomment benchmark --target /path/to/repo

# Detailed benchmark with multiple iterations
uncomment benchmark --target /path/to/repo --iterations 5 --threads 8

# Memory and performance profiling
uncomment profile /path/to/repo
```

The accuracy gained through AST parsing is worth the small performance cost, and parallel processing makes it suitable for even the largest codebases.

## License

MIT