[go: up one dir, main page]

Skip to content

Repository Languages can provide inaccurate percentages with some files (Puppet seen as Pascal)

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

The repository languages feature can present some repositories with incorrect percentage calculations of files when Puppet (.pp) is used. These are incorrectly interpreted as "Pascal" files (also .pp).

For example, this test project (selected randomly as it contained a good mix of Puppet files), copied from a source on a competitor git website, has different calculations:

Source GitLab.com
86.1% Puppet, 9.8% Shell, 3.1% Ruby 50% Pascal, 31.4% Puppet, 9.2% Shell, 5.4% M4

Additionally, we see different results when viewing the repository locally and using several different tools. As noted in this epic GitLab switched from github-linguist to go-enry in 15.x, but the results are not always accurate. I've also included SCC for comparison:

github-linguist

81.47%  108946     Puppet
9.23%   12349      Shell
5.38%   7193       M4
2.93%   3916       Ruby
0.99%   1321       HTML

go-enry

50.04%  Pascal
31.43%  Puppet
9.23%   Shell
5.38%   M4
2.93%   Ruby
0.99%   HTML+ERB

SCC

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Puppet                     497      5481      534      2199     2748         48
Ruby                        23       165       23         0      142          2
Shell                       23       324       87        73      164         58
Ruby HTML                    3        41        9         0       32          0
Markdown                     2        72       19         0       53          0
Gemfile                      1        10        1         0        9          0
Monkey C                     1       176        0         0      176          9
Rakefile                     1        24        6         1       17          0
YAML                         1        14        0         0       14          0
gitignore                    1         1        0         0        1          0
───────────────────────────────────────────────────────────────────────────────
Total                      553      6308      679      2273     3356        117
───────────────────────────────────────────────────────────────────────────────

## Author Note, for "Lines", this equals 86% of the repository files, for "Code", this equals 81% of the repository. 

The values from github-linguist or SCC are closer to the actual consistency of files for the repository, and do not incorrectly guess that Pascal is used, which is what go-enry currently does.

Steps to reproduce

View, clone or fork the example project to view the current percentages (also shown below)

Current Percentages
image

Example Project

The same project as linked above.

What is the current bug behavior?

Language calculation for repositories with Puppet files are incorrectly interpreted as Pascal files.

What is the expected correct behavior?

Language percentage calculation is correct, or more accurate, in comparison to our competitors.

Output of checks

This bug happens on GitLab.com

Possible fixes

It looks like both enry and linguist have implementations that support the language pattern detection between Puppet and Pascal. Is it possible some detection is being missed when a repository is checked or scanned?

Edited by 🤖 GitLab Bot 🤖