gla

glasses

High-quality Neural Networks for Computer Vision 😎

Showing:

Popularity

Downloads/wk

0

GitHub Stars

206

Maintenance

Last Commit

2d ago

Contributors

5

Package

Dependencies

0

License

Categories

Readme

%load_ext autoreload
%autoreload 2

Glasses 😎

alt

codecov

Compact, concise and customizable deep learning computer vision library

Models have been stored into the hugging face hub!

Doc is here

TL;TR

This library has

  • human readable code, no research code
  • common component are shared across models
  • same APIs for all models (you learn them once and they are always the same)
  • clear and easy to use model constomization (see here)
  • classification and segmentation
  • emoji in the name ;)

Stuff implemented so far:

Installation

You can install glasses using pip by running

pip install git+https://github.com/FrancescoSaverioZuppichini/glasses

Motivations

Almost all existing implementations of the most famous model are written with very bad coding practices, what today is called research code. I struggled to understand some of the implementations even if in the end were just a few lines of code.

Most of them are missing a global structure, they used tons of code repetition, they are not easily customizable and not tested. Since I do computer vision for living, I needed a way to make my life easier.

Getting started

The API are shared across all models!

import torch
from glasses.models import AutoModel, AutoTransform
# load one model
model = AutoModel.from_pretrained('resnet18').eval()
# and its correct input transformation
tr = AutoTransform.from_name('resnet18')
model.summary(device='cpu' ) # thanks to torchinfo
# at any time, see all the models
AutoModel.models_table() 
            Models                 
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Name                   ┃ Pretrained ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
β”‚ resnet18               β”‚ true       β”‚
β”‚ resnet26               β”‚ true       β”‚
β”‚ resnet26d              β”‚ true       β”‚
β”‚ resnet34               β”‚ true       β”‚
β”‚ resnet34d              β”‚ true       β”‚
β”‚ resnet50               β”‚ true       β”‚
...

Interpretability

import requests
from PIL import Image
from io import BytesIO
from glasses.interpretability import GradCam, SaliencyMap
from torchvision.transforms import Normalize
# get a cute dog 🐢
r = requests.get('https://i.insider.com/5df126b679d7570ad2044f3e?width=700&format=jpeg&auto=webp')
im = Image.open(BytesIO(r.content))
# un-normalize when done
mean, std = tr.transforms[-1].mean, tr.transforms[-1].std
postprocessing = Normalize(-mean / std, (1.0 / std))
# apply preprocessing
x =  tr(im).unsqueeze(0)
_ = model.interpret(x, using=GradCam(), postprocessing=postprocessing).show()

alt

Classification

from glasses.models import ResNet
from torch import nn
# change activation
model = AutoModel.from_pretrained('resnet18', activation = nn.SELU).eval()
# or directly from the model class
ResNet.resnet18(activation = nn.SELU)
# change number of classes
ResNet.resnet18(n_classes=100)
# freeze only the convolution weights
model = AutoModel.from_pretrained('resnet18') # or also ResNet.resnet18(pretrained=True) 
model.freeze(who=model.encoder)

Get the inner features

# model.encoder has special hooks ready to be activated
# call the .features to trigger them
model.encoder.features
x = torch.randn((1, 3, 224, 224))
model(x)
[f.shape for f in model.encoder.features]

Change inner block

# what about resnet with inverted residuals?
from glasses.models.classification.efficientnet import InvertedResidualBlock
ResNet.resnet18(block = InvertedResidualBlock)

Segmentation

from functools import partial
from glasses.models.segmentation.unet import UNet, UNetDecoder
# vanilla Unet
unet = UNet()
# let's change the encoder
unet = UNet.from_encoder(partial(AutoModel.from_name, 'efficientnet_b1'))
# mmm I want more layers in the decoder!
unet = UNet(decoder=partial(UNetDecoder, widths=[256, 128, 64, 32, 16]))
# maybe resnet was better
unet = UNet(encoder=lambda **kwargs: ResNet.resnet26(**kwargs).encoder)
# same API
# unet.summary(input_shape=(1,224,224))

unet

More examples

# change the decoder part
model = ResNet.resnet18(pretrained=True)
my_head = nn.Sequential(
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(),
    nn.Linear(model.encoder.widths[-1], 512),
    nn.Dropout(0.2),
    nn.ReLU(),
    nn.Linear(512, 1000))

model.head = my_head

x = torch.rand((1,3,224,224))
model(x).shape #torch.Size([1, 1000])

Pretrained Models

I am currently working on the pretrained models and the best way to make them available

This is a list of all the pretrained models available so far!. They are all trained on ImageNet.

I used a batch_size=64 and a GTX 1080ti to evaluale the models.

top1top5timebatch_size
vit_base_patch16_3840.8420.97221130.8164
vit_large_patch16_2240.828360.96406893.48664
eca_resnet50t0.822340.96172241.75464
eca_resnet101d0.821660.96052213.63264
efficientnet_b30.820340.9603199.59964
regnety_0320.819580.95964136.51864
vit_base_patch32_3840.81660.9613243.23464
vit_base_patch16_2240.8150.96018306.68664
deit_small_patch16_2240.810820.95316132.86864
eca_resnet50d0.806040.95322135.56764
resnet50d0.804920.9512897.582764
cse_resnet500.802920.95048108.76564
efficientnet_b20.801260.95124127.17764
eca_resnet26t0.798620.95084155.39664
regnety_0640.797120.94774183.06564
regnety_0400.792220.94656124.88164
resnext101_32x8d0.79210.94556290.3864
regnetx_0640.790660.94456176.364
wide_resnet101_20.78910.94344277.75564
regnetx_0400.784860.94242122.61964
wide_resnet50_20.784640.94064201.63464
efficientnet_b10.78310.9409698.714364
resnet1520.78250.93982186.19164
regnetx_0320.77920.93996319.55864
resnext50_32x4d0.776280.9368114.32564
regnety_0160.776040.9370296.54764
efficientnet_b00.773320.9356667.214764
resnet1010.773140.93556134.14864
densenet1610.771460.93602239.38864
resnet34d0.771180.9341859.993864
densenet2010.769320.9339158.51464
regnetx_0160.766840.932891.753664
resnet26d0.7660.9318870.645364
regnety_0080.762380.9302654.128664
resnet500.760120.9293489.797664
densenet1690.756280.9281127.07764
resnet260.753940.9258465.580164
resnet340.750960.9224656.898564
regnety_0060.750680.9247455.561164
regnetx_0080.747880.9219457.955964
densenet1210.744720.91974104.1364
deit_tiny_patch16_2240.74370.9189866.66264
vgg19_bn0.742160.91848169.35764
regnety_0040.737660.9163868.489364
regnetx_0060.736820.9156881.470364
vgg16_bn0.734760.91536150.31764
vgg190.72360.9085155.85164
regnetx_0040.722980.9064458.004964
vgg160.716280.90368135.39864
vgg13_bn0.716180.9036129.07764
efficientnet_lite00.70410.8989462.421164
vgg11_bn0.704080.8972486.945964
vgg130.699840.89306116.05264
regnety_0020.69980.8942246.80464
resnet180.696440.8898246.202964
vgg110.688720.8865879.413664
regnetx_0020.686580.8824445.921164

Assuming you want to load efficientnet_b1:

from glasses.models import EfficientNet, AutoModel, AutoTransform

# load it using AutoModel
model = AutoModel.from_pretrained('efficientnet_b1').eval()
# or from its own class
model = EfficientNet.efficientnet_b1(pretrained=True)
# you may also need to get the correct transformation that must be applied on the input
tr = AutoTransform.from_name('efficientnet_b1')

In this case, tr is

Compose(
    Resize(size=240, interpolation=PIL.Image.BICUBIC)
    CenterCrop(size=(240, 240))
    ToTensor()
    Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
)

Deep Customization

All models are composed by sharable parts:

  • Block
  • Layer
  • Encoder
  • Head
  • Decoder

Block

Each model has its building block, they are noted by *Block. In each block, all the weights are in the .block field. This makes it very easy to customize one specific model.

from glasses.models.classification.vgg import VGGBasicBlock
from glasses.models.classification.resnet import ResNetBasicBlock, ResNetBottleneckBlock, ResNetBasicPreActBlock, ResNetBottleneckPreActBlock
from glasses.models.classification.senet import SENetBasicBlock, SENetBottleneckBlock
from glasses.models.classification.resnetxt import ResNetXtBottleNeckBlock
from glasses.models.classification.densenet import DenseBottleNeckBlock
from glasses.models.classification.wide_resnet import WideResNetBottleNeckBlock
from glasses.models.classification.efficientnet import EfficientNetBasicBlock

For example, if we want to add Squeeze and Excitation to the resnet bottleneck block, we can just

from glasses.nn.att import SpatialSE
from  glasses.models.classification.resnet import ResNetBottleneckBlock

class SEResNetBottleneckBlock(ResNetBottleneckBlock):
    def __init__(self, in_features: int, out_features: int, squeeze: int = 16, *args, **kwargs):
        super().__init__(in_features, out_features, *args, **kwargs)
        # all the weights are in block, we want to apply se after the weights
        self.block.add_module('se', SpatialSE(out_features, reduction=squeeze))
        
SEResNetBottleneckBlock(32, 64)

Then, we can use the class methods to create the new models following the existing architecture blueprint, for example, to create se_resnet50

ResNet.resnet50(block=ResNetBottleneckBlock)

The cool thing is each model has the same api, if I want to create a vgg13 with the ResNetBottleneckBlock I can just

from glasses.models import VGG
model = VGG.vgg13(block=SEResNetBottleneckBlock)
model.summary()

Some specific model can require additional parameter to the block, for example MobileNetV2 also required a expansion parameter so our SEResNetBottleneckBlock won't work.

Layer

A Layer is a collection of blocks, it is used to stack multiple blocks together following some logic. For example, ResNetLayer

from glasses.models.classification.resnet import ResNetLayer

ResNetLayer(64, 128, depth=2)

Encoder

The encoder is what encoders a vector, so the convolution layers. It has always two very important parameters.

  • widths
  • depths

widths is the wide at each layer, so how much features there are depths is the depth at each layer, so how many blocks there are

For example, ResNetEncoder will creates multiple ResNetLayer based on the len of widths and depths. Let's see some example.

from glasses.models.classification.resnet import ResNetEncoder
# 3 layers, with 32,64,128 features and 1,2,3 block each
ResNetEncoder(
    widths=[32,64,128],
    depths=[1,2,3])

All encoders are subclass of Encoder that allows us to hook on specific stages to get the featuers. All you have to do is first call .features to notify the model you want to receive the features, and then pass an input.

enc = ResNetEncoder()
enc.features
enc(torch.randn((1,3,224,224)))
print([f.shape for f in enc.features])

Remember each model has always a .encoder field

from glasses.models import ResNet

model = ResNet.resnet18()
model.encoder.widths[-1]

The encoder knows the number of output features, you can access them by

Features

Each encoder can return a list of features accessable by the .features field. You need to call it once before in order to notify the encoder we wish to also store the features

from glasses.models.classification.resnet import ResNetEncoder

x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
enc(x)
features = enc.features # now we have all the features from each layer (stage)
[print(f.shape) for f in features]
# torch.Size([1, 64, 112, 112])
# torch.Size([1, 64, 56, 56])
# torch.Size([1, 128, 28, 28])
# torch.Size([1, 256, 14, 14])

Head is the last part of the model, it usually perform the classification

from glasses.models.classification.resnet import ResNetHead


ResNetHead(512, n_classes=1000)

Decoder

The decoder takes the last feature from the .encoder and decode it. This is usually done in segmentation models, such as Unet.

from glasses.models.segmentation.unet import UNetDecoder
x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
x = enc(x)
features = enc.features
# we need to tell the decoder the first feature size and the size of the lateral features
dec = UNetDecoder(start_features=enc.widths[-1],
                  lateral_widths=enc.features_widths[::-1])
out = dec(x, features[::-1])
out.shape

This object oriented structure allows to reuse most of the code across the models

nameParametersSize (MB)
cse_resnet10149,326,872188.17
cse_resnet15266,821,848254.91
cse_resnet1811,778,59244.93
cse_resnet3421,958,86883.77
cse_resnet5028,088,024107.15
deit_base_patch16_22487,184,592332.58
deit_base_patch16_38487,186,128357.63
deit_small_patch16_22422,359,63285.3
deit_tiny_patch16_2245,872,40022.4
densenet1217,978,85630.44
densenet16128,681,000109.41
densenet16914,149,48053.98
densenet20120,013,92876.35
eca_resnet101d44,568,563212.62
eca_resnet101t44,566,027228.65
eca_resnet18d16,014,45298.41
eca_resnet18t1,415,68437.91
eca_resnet26d16,014,45298.41
eca_resnet26t16,011,916114.44
eca_resnet50d25,576,350136.65
eca_resnet50t25,573,814152.68
efficientnet_b05,288,54820.17
efficientnet_b17,794,18429.73
efficientnet_b29,109,99434.75
efficientnet_b312,233,23246.67
efficientnet_b419,341,61673.78
efficientnet_b530,389,784115.93
efficientnet_b643,040,704164.19
efficientnet_b766,347,960253.1
efficientnet_b887,413,142505.01
efficientnet_l2480,309,3082332.13
efficientnet_lite04,652,00817.75
efficientnet_lite15,416,68020.66
efficientnet_lite26,092,07223.24
efficientnet_lite38,197,09631.27
efficientnet_lite413,006,56849.62
fishnet15024,960,80895.22
fishnet9916,630,31263.44
mobilenet_v23,504,87224.51
mobilenetv23,504,87213.37
regnetx_0022,684,79210.24
regnetx_0045,157,51219.67
regnetx_0066,196,04023.64
regnetx_0087,259,65627.69
regnetx_0169,190,13635.06
regnetx_03215,296,55258.35
regnetx_04022,118,24897.66
regnetx_06426,209,256114.02
regnetx_08034,561,448147.43
regnety_0023,162,99612.07
regnety_0044,344,14416.57
regnety_0066,055,16023.1
regnety_0086,263,16823.89
regnety_01611,202,43042.73
regnety_03219,436,33874.14
regnety_04020,646,65691.77
regnety_06430,583,252131.52
regnety_08039,180,068165.9
resnest101e48,275,016184.15
resnest14d10,611,68840.48
resnest200e70,201,544267.8
resnest269e7,551,11228.81
resnest26d17,069,44865.11
resnest50d27,483,240104.84
resnest50d_1s4x24d25,677,00097.95
resnest50d_4s2x40d30,417,592116.03
resnet10144,549,160169.94
resnet15260,192,808229.62
resnet1811,689,51244.59
resnet20064,673,832246.71
resnet2615,995,17661.02
resnet26d16,014,40861.09
resnet3421,797,67283.15
resnet34d21,816,90483.22
resnet5025,557,03297.49
resnet50d25,576,26497.57
resnext101_32x16d194,026,792740.15
resnext101_32x32d468,530,4721787.3
resnext101_32x48d828,411,1763160.14
resnext101_32x8d88,791,336338.71
resnext50_32x4d25,028,90495.48
se_resnet10149,292,328188.04
se_resnet15266,770,984254.71
se_resnet1811,776,55244.92
se_resnet3421,954,85683.75
se_resnet5028,071,976107.09
unet23,202,53088.51
vgg11132,863,336506.83
vgg11_bn132,868,840506.85
vgg13133,047,848507.54
vgg13_bn133,053,736507.56
vgg16138,357,544527.79
vgg16_bn138,365,992527.82
vgg19143,667,240548.05
vgg19_bn143,678,248548.09
vit_base_patch16_22486,415,592329.65
vit_base_patch16_38486,415,592329.65
vit_base_patch32_38488,185,064336.4
vit_huge_patch16_224631,823,0802410.21
vit_huge_patch32_384634,772,2002421.46
vit_large_patch16_224304,123,8801160.14
vit_large_patch16_384304,123,8801160.14
vit_large_patch32_384306,483,1761169.14
vit_small_patch16_22448,602,344185.4
wide_resnet101_2126,886,696484.03
wide_resnet50_268,883,240262.77

Credits

Most of the weights were trained by other people and adapted to glasses. It is worth cite

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100