Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
Перейти к файлу
Colin Seymour 71a100d9bf
Add support for the lean comment format to the tokenizer (#6625)
* Add support for lean comments

* Update tests

* Cater for another scenario

* Add another example

* Update test/test_tokenizer.rb

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

---------

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
2023-12-11 11:58:48 +00:00
.devcontainer Improve and document using Codespaces and Devcontainers (#6539) 2023-09-13 09:42:32 +01:00
.github Fix trailing whitespace & empty lines (#6385) 2023-05-30 09:41:15 +00:00
bin Accept branch name as argument (#6069) 2022-09-29 10:09:25 -04:00
docs Address potential causes for "Your search did not match any code" issue when clicking a language in the stats bar (#6468) 2023-08-02 19:28:57 +01:00
ext/linguist Add support for the lean comment format to the tokenizer (#6625) 2023-12-11 11:58:48 +00:00
lib Add Glimmer JS language (#6630) 2023-12-08 09:09:38 +00:00
samples Add Glimmer JS language (#6630) 2023-12-08 09:09:38 +00:00
script Revert "Use ghcr.io for grammar compiler docker container" (#6546) 2023-09-13 10:14:48 +01:00
test Add support for the lean comment format to the tokenizer (#6625) 2023-12-11 11:58:48 +00:00
tools/grammars Revert "Use ghcr.io for grammar compiler docker container" (#6546) 2023-09-13 10:14:48 +01:00
vendor Add Glimmer JS language (#6630) 2023-12-08 09:09:38 +00:00
.dockerignore Add Dockerfile (#4687) 2020-09-08 09:54:02 +01:00
.editorconfig Fix `trim_trailing_whitespace` typo in `.editorconfig` file (#6349) 2023-04-03 11:31:17 +01:00
.gitattributes Create devcontainer configuration (#6479) 2023-09-07 14:27:13 +01:00
.gitignore Add brewfile to bootstrap (#6068) 2022-09-16 12:44:42 -05:00
.gitmodules Add Glimmer JS language (#6630) 2023-12-08 09:09:38 +00:00
Brewfile Add brewfile to bootstrap (#6068) 2022-09-16 12:44:42 -05:00
CONTRIBUTING.md Improve and document using Codespaces and Devcontainers (#6539) 2023-09-13 09:42:32 +01:00
Dockerfile Dockerfile Updates (#6066) 2022-09-14 17:06:50 +01:00
Gemfile Byebug requires Ruby 2.2 (#3790) 2017-08-24 10:17:12 +01:00
LICENSE Update LICENSE 2017-02-15 14:11:37 -08:00
README.md Improve and document using Codespaces and Devcontainers (#6539) 2023-09-13 09:42:32 +01:00
Rakefile Release v7.27.0 (#6540) 2023-09-13 10:26:06 +01:00
github-linguist.gemspec Bump mocha dependency version (#6499) 2023-08-03 10:03:09 +01:00
go.work Update grammar compiler to use Go 1.20 (#6296) 2023-02-27 10:39:42 +00:00
grammars.yml Add Glimmer JS language (#6630) 2023-12-08 09:09:38 +00:00

README.md

Linguist

Actions Status

Open in GitHub Codespaces

This library is used on GitHub.com to detect blob languages, ignore binary or vendored files, suppress generated files in diffs, and generate language breakdown graphs.

Documentation

Installation

Install the gem:

gem install github-linguist

Dependencies

Linguist is a Ruby library so you will need a recent version of Ruby installed. There are known problems with the macOS/Xcode supplied version of Ruby that causes problems installing some of the dependencies. Accordingly, we highly recommend you install a version of Ruby using Homebrew, rbenv, rvm, ruby-build, asdf or other packaging system, before attempting to install Linguist and the dependencies.

Linguist uses charlock_holmes for character encoding and rugged for libgit2 bindings for Ruby. These components have their own dependencies.

  1. charlock_holmes
  2. rugged

You may need to install missing dependencies before you can install Linguist. For example, on macOS with Homebrew:

brew install cmake pkg-config icu4c

On Ubuntu:

sudo apt-get install build-essential cmake pkg-config libicu-dev zlib1g-dev libcurl4-openssl-dev libssl-dev ruby-dev

Usage

Application usage

Linguist can be used in your application as follows:

require 'rugged'
require 'linguist'

repo = Rugged::Repository.new('.')
project = Linguist::Repository.new(repo, repo.head.target_id)
project.language       #=> "Ruby"
project.languages      #=> { "Ruby" => 119387 }

Command line usage

Git Repository

A repository's languages stats can also be assessed from the command line using the github-linguist executable. Without any options, github-linguist will output the language breakdown by percentage and file size.

cd /path-to-repository
github-linguist

You can try running github-linguist on the root directory in this repository itself:

$ github-linguist
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Additional options

--rev REV

The --rev REV flag will change the git revision being analyzed to any gitrevisions(1) compatible revision you specify.

This is useful to analyze the makeup of a repo as of a certain tag, or in a certain branch.

For example, here is the popular Jekyll open source project.

$ github-linguist jekyll

70.64%  709959     Ruby
23.04%  231555     Gherkin
3.80%   38178      JavaScript
1.19%   11943      HTML
0.79%   7900       Shell
0.23%   2279       Dockerfile
0.13%   1344       Earthly
0.10%   1019       CSS
0.06%   606        SCSS
0.02%   234        CoffeeScript
0.01%   90         Hack

And here is Jekyll's published website, from the gh-pages branch inside their repository.

$ github-linguist jekyll --rev origin/gh-pages
100.00% 2568354    HTML
--breakdown

The --breakdown or -b flag will additionally show the breakdown of files by language.

You can try running github-linguist on the root directory in this repository itself:

$ github-linguist --breakdown
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Ruby:
Gemfile
Rakefile
bin/git-linguist
bin/github-linguist
ext/linguist/extconf.rb
github-linguist.gemspec
lib/linguist.rb
--json

The --json or -j flag output the data into JSON format.

$ github-linguist --json
{"Dockerfile":{"size":1212,"percentage":"0.31"},"Ruby":{"size":264519,"percentage":"66.84"},"C":{"size":97685,"percentage":"24.68"},"Lex":{"size":5098,"percentage":"1.29"},"Shell":{"size":1257,"percentage":"0.32"},"Go":{"size":25999,"percentage":"6.57"}}

This option can be used in conjunction with --breakdown to get a full list of files along with the size and percentage data.

$ github-linguist --breakdown --json
{"Dockerfile":{"size":1212,"percentage":"0.31","files":["Dockerfile","tools/grammars/Dockerfile"]},"Ruby":{"size":264519,"percentage":"66.84","files":["Gemfile","Rakefile","bin/git-linguist","bin/github-linguist","ext/linguist/extconf.rb","github-linguist.gemspec","lib/linguist.rb",...]}}

Single file

Alternatively you can find stats for a single file using the github-linguist executable.

You can try running github-linguist on files in this repository itself:

$ github-linguist grammars.yml
grammars.yml: 884 lines (884 sloc)
  type:      Text
  mime type: text/x-yaml
  language:  YAML

Docker

If you have Docker installed you can build an image and run Linguist within a container:

$ docker build -t linguist .
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist github-linguist --breakdown
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Ruby:
Gemfile
Rakefile
bin/git-linguist
bin/github-linguist
ext/linguist/extconf.rb
github-linguist.gemspec
lib/linguist.rb

Contributing

Please check out our contributing guidelines.

License

The language grammars included in this gem are covered by their repositories' respective licenses. vendor/README.md lists the repository for each grammar.

All other files are covered by the MIT license, see LICENSE.