Language Savant, Python clone of github/linguist.
pip install linguist
Linguist defines the list of all languages known in a yaml file. In order for a file to be highlighted, a language and lexer must be defined there.
Most languages are detected by their file extension. This is the fastest and most common situation.
For disambiguating between files with common extensions, we use a Bayesian classifier. For an example, this helps us tell the difference between
.h files which could be either C, C++, or Obj-C.
For testing, there is a simple FileBlob API:
from linguist.libs.file_blob import FileBlob FileBlob('test.py').language.name #=> 'Python' FileBlob('test_file').language.name #=> 'Python'
The actual syntax highlighting is handled by pygments. It also provides a Lexer abstraction that determines which highlighter should be used on a file.
The Language Graph you see on every repository is built by aggregating the languages of all repo's blobs.
The repository stats API can be used on a directory:
These stats are also printed out by the binary. Try running
Checking other code into your git repo is a common practice. But this often inflates your project's language stats and may even cause your project to be labeled as another language. We are able to identify some of these files and directories and exclude them.
from linguist.libs.file_blob import FileBlob FileBlob('static/js/jquery-2.0.0.min.js').is_vendored #=> True
from linguist.libs.file_blob import FileBlob FileBlob('jquery-2.0.0.min.js').is_generated #=> True FileBlob('app.coffee').is_generated #=> True
* Fork the repository. * Create a topic branch. * Implement your feature or bug fix. * Add, commit, and push your changes. * Submit a pull request.
cd tests/ python run.py