PyTorch内置模型简介

创建日期：2025-03-12

更新日期：2025-03-12

示例代码

from torchvision import models

vgg = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)
print(vgg)

torchvision模型

分类

模型	说明
AlexNet
ConvNeXt
DenseNet
EfficientNet
EfficientNetV2
GoogLeNet
Inception V3
MaxVit
MNASNet
MobileNet V2
MobileNet V3
RegNet
ResNet
ResNeXt
ShuffleNet V2
SqueezeNet
SwinTransformer
VGG
VisionTransformer
Wide ResNet

语义分割

模型	说明
DeepLabV3
FCN
LRASPP

目标检测、实例分割和人体关键点检测

模型	说明
Faster R-CNN
FCOS
RetinaNet
SSD
SSDlite

实例分割

模型	说明
Mask R-CNN

关键点检测

模型	说明
Keypoint R-CNN

视频分类

模型	说明
Video MViT
Video ResNet
Video S3D
Video SwinTransformer

光流

模型	说明
RAFT

torchaudio数据集

模型	说明
Conformer	Conformer architecture introduced in //Conformer: Convolution-augmented Transformer for Speech Recognition// [Gulati //et al.//, 2020].
ConvTasNet	Conv-TasNet architecture introduced in //Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation// [Luo and Mesgarani, 2019].
DeepSpeech	DeepSpeech architecture introduced in //Deep Speech: Scaling up end-to-end speech recognition// [Hannun //et al.//, 2014].
Emformer	Emformer architecture introduced in //Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition// [Shi //et al.//, 2021].
HDemucs	Hybrid Demucs model from //Hybrid Spectrogram and Waveform Source Separation// [Défossez, 2021].
HuBERTPretrainModel	HuBERT model used for pretraining in //HuBERT// [Hsu //et al.//, 2021].
RNNT	Recurrent neural network transducer (RNN-T) model.
RNNTBeamSearch	Beam search decoder for RNN-T model.
SquimObjective	Speech Quality and Intelligibility Measures (SQUIM) model that predicts objective metric scores for speech enhancement (e.g., STOI, PESQ, and SI-SDR).
SquimSubjective	Speech Quality and Intelligibility Measures (SQUIM) model that predicts subjective metric scores for speech enhancement (e.g., Mean Opinion Score (MOS)).
Tacotron2	Tacotron2 model from //Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions// [Shen //et al.//, 2018] based on the implementation from Nvidia Deep Learning Examples.
Wav2Letter	Wav2Letter model architecture from //Wav2Letter: an End-to-End ConvNet-based Speech Recognition System// [Collobert //et al.//, 2016].
Wav2Vec2Model	Acoustic model used in //wav2vec 2.0// [Baevski //et al.//, 2020].
WaveRNN	WaveRNN model from //Efficient Neural Audio Synthesis// [Kalchbrenner //et al.//, 2018] based on the implementation from fatchord/WaveRNN.

简介

一个来自三线小城市的程序员开发经验总结。

PyTorch内置模型简介

示例代码

torchvision模型

分类

语义分割

目标检测、实例分割和人体关键点检测

实例分割

关键点检测

视频分类

光流

torchaudio数据集

简介

最新修改

导航