PyTorch内置模型简介

创建日期:2025-03-12
更新日期:2025-03-12

示例代码

from torchvision import models

vgg = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)
print(vgg)

torchvision模型

分类

模型说明
AlexNet
ConvNeXt
DenseNet
EfficientNet
EfficientNetV2
GoogLeNet
Inception V3
MaxVit
MNASNet
MobileNet V2
MobileNet V3
RegNet
ResNet
ResNeXt
ShuffleNet V2
SqueezeNet
SwinTransformer
VGG
VisionTransformer
Wide ResNet

语义分割

模型说明
DeepLabV3
FCN
LRASPP

目标检测、实例分割和人体关键点检测

模型说明
Faster R-CNN
FCOS
RetinaNet
SSD
SSDlite

实例分割

模型说明
Mask R-CNN

关键点检测

模型说明
Keypoint R-CNN

视频分类

模型说明
Video MViT
Video ResNet
Video S3D
Video SwinTransformer

光流

模型说明
RAFT

torchaudio数据集

模型说明
ConformerConformer architecture introduced in //Conformer: Convolution-augmented Transformer for Speech Recognition// [Gulati //et al.//, 2020].
ConvTasNetConv-TasNet architecture introduced in //Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation// [Luo and Mesgarani, 2019].
DeepSpeechDeepSpeech architecture introduced in //Deep Speech: Scaling up end-to-end speech recognition// [Hannun //et al.//, 2014].
EmformerEmformer architecture introduced in //Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition// [Shi //et al.//, 2021].
HDemucsHybrid Demucs model from //Hybrid Spectrogram and Waveform Source Separation// [Défossez, 2021].
HuBERTPretrainModelHuBERT model used for pretraining in //HuBERT// [Hsu //et al.//, 2021].
RNNTRecurrent neural network transducer (RNN-T) model.
RNNTBeamSearchBeam search decoder for RNN-T model.
SquimObjectiveSpeech Quality and Intelligibility Measures (SQUIM) model that predicts objective metric scores for speech enhancement (e.g., STOI, PESQ, and SI-SDR).
SquimSubjectiveSpeech Quality and Intelligibility Measures (SQUIM) model that predicts subjective metric scores for speech enhancement (e.g., Mean Opinion Score (MOS)).
Tacotron2Tacotron2 model from //Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions// [Shen //et al.//, 2018] based on the implementation from Nvidia Deep Learning Examples.
Wav2LetterWav2Letter model architecture from //Wav2Letter: an End-to-End ConvNet-based Speech Recognition System// [Collobert //et al.//, 2016].
Wav2Vec2ModelAcoustic model used in //wav2vec 2.0// [Baevski //et al.//, 2020].
WaveRNNWaveRNN model from //Efficient Neural Audio Synthesis// [Kalchbrenner //et al.//, 2018] based on the implementation from fatchord/WaveRNN.