示例代码
from torchvision import models
vgg = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)
print(vgg)
torchvision模型
分类
模型 | 说明 |
AlexNet | |
ConvNeXt | |
DenseNet | |
EfficientNet | |
EfficientNetV2 | |
GoogLeNet | |
Inception V3 | |
MaxVit | |
MNASNet | |
MobileNet V2 | |
MobileNet V3 | |
RegNet | |
ResNet | |
ResNeXt | |
ShuffleNet V2 | |
SqueezeNet | |
SwinTransformer | |
VGG | |
VisionTransformer | |
Wide ResNet | |
语义分割
目标检测、实例分割和人体关键点检测
模型 | 说明 |
Faster R-CNN | |
FCOS | |
RetinaNet | |
SSD | |
SSDlite | |
实例分割
关键点检测
视频分类
模型 | 说明 |
Video MViT | |
Video ResNet | |
Video S3D | |
Video SwinTransformer | |
光流
torchaudio数据集
模型 | 说明 |
Conformer | Conformer architecture introduced in //Conformer: Convolution-augmented Transformer for Speech Recognition// [Gulati //et al.//, 2020]. |
ConvTasNet | Conv-TasNet architecture introduced in //Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation// [Luo and Mesgarani, 2019]. |
DeepSpeech | DeepSpeech architecture introduced in //Deep Speech: Scaling up end-to-end speech recognition// [Hannun //et al.//, 2014]. |
Emformer | Emformer architecture introduced in //Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition// [Shi //et al.//, 2021]. |
HDemucs | Hybrid Demucs model from //Hybrid Spectrogram and Waveform Source Separation// [Défossez, 2021]. |
HuBERTPretrainModel | HuBERT model used for pretraining in //HuBERT// [Hsu //et al.//, 2021]. |
RNNT | Recurrent neural network transducer (RNN-T) model. |
RNNTBeamSearch | Beam search decoder for RNN-T model. |
SquimObjective | Speech Quality and Intelligibility Measures (SQUIM) model that predicts objective metric scores for speech enhancement (e.g., STOI, PESQ, and SI-SDR). |
SquimSubjective | Speech Quality and Intelligibility Measures (SQUIM) model that predicts subjective metric scores for speech enhancement (e.g., Mean Opinion Score (MOS)). |
Tacotron2 | Tacotron2 model from //Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions// [Shen //et al.//, 2018] based on the implementation from Nvidia Deep Learning Examples. |
Wav2Letter | Wav2Letter model architecture from //Wav2Letter: an End-to-End ConvNet-based Speech Recognition System// [Collobert //et al.//, 2016]. |
Wav2Vec2Model | Acoustic model used in //wav2vec 2.0// [Baevski //et al.//, 2020]. |
WaveRNN | WaveRNN model from //Efficient Neural Audio Synthesis// [Kalchbrenner //et al.//, 2018] based on the implementation from fatchord/WaveRNN. |