Deep_Learning/Lab1/Pytorch基本操作实验报告.ipynb
2023-10-11 23:10:56 +08:00

1326 lines
49 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "3b57686b-7ac8-4897-bf76-3d982b1ff8da",
"metadata": {},
"source": [
"<p style=\"text-align: center;\"><img alt=\"school-logo\" src=\"../images/school_logo.png\" style=\"zoom: 50%;\" /></p>\n",
"\n",
"<h1 align=\"center\">本科生《深度学习》课程<br>实验报告</h1>\n",
"<div style=\"text-align: center;\">\n",
" <div><span style=\"display: inline-block; width: 65px; text-align: center;\">课程名称</span><span style=\"display: inline-block; width: 25px;\">:</span><span style=\"display: inline-block; width: 210px; font-weight: bold; text-align: left;\">深度学习</span></div>\n",
" <div><span style=\"display: inline-block; width: 65px; text-align: center;\">实验题目</span><span style=\"display: inline-block; width: 25px;\">:</span><span style=\"display: inline-block; width: 210px; font-weight: bold; text-align: left;\">Pytorch基本操作</span></div>\n",
" <div><span style=\"display: inline-block; width: 65px; text-align: center;\">学号</span><span style=\"display: inline-block; width: 25px;\">:</span><span style=\"display: inline-block; width: 210px; font-weight: bold; text-align: left;\">21281280</span></div>\n",
" <div><span style=\"display: inline-block; width: 65px; text-align: center;\">姓名</span><span style=\"display: inline-block; width: 25px;\">:</span><span style=\"display: inline-block; width: 210px; font-weight: bold; text-align: left;\">柯劲帆</span></div>\n",
" <div><span style=\"display: inline-block; width: 65px; text-align: center;\">班级</span><span style=\"display: inline-block; width: 25px;\">:</span><span style=\"display: inline-block; width: 210px; font-weight: bold; text-align: left;\">物联网2101班</span></div>\n",
" <div><span style=\"display: inline-block; width: 65px; text-align: center;\">指导老师</span><span style=\"display: inline-block; width: 25px;\">:</span><span style=\"display: inline-block; width: 210px; font-weight: bold; text-align: left;\">张淳杰</span></div>\n",
" <div><span style=\"display: inline-block; width: 65px; text-align: center;\">报告日期</span><span style=\"display: inline-block; width: 25px;\">:</span><span style=\"display: inline-block; width: 210px; font-weight: bold; text-align: left;\">2023年10月9日</span></div>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "e24aa17e-faf9-4d69-9eae-43159116b56f",
"metadata": {},
"source": [
"实验环境:\n",
"- OSUbuntu 22.04 (Kernel: 6.2.0-34-generic)\n",
"- CPU12th Gen Intel(R) Core(TM) i7-12700H\n",
"- GPUNVIDIA GeForce RTX 3070 Ti Laptop\n",
"- cuda: 12.2\n",
"- conda: miniconda 23.9.0\n",
"- python3.10.13\n",
"- pytorch2.1.0"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a4e12268-bad4-44c4-92d5-883624d93e25",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import torch\n",
"from torch.autograd import Variable\n",
"from torch.utils.data import Dataset, DataLoader\n",
"from torch import nn\n",
"from torchvision import datasets, transforms"
]
},
{
"cell_type": "markdown",
"id": "cc7f0ce5-d613-425b-807c-78115632cd80",
"metadata": {},
"source": [
"引用相关库。"
]
},
{
"cell_type": "markdown",
"id": "59a43d35-56ac-4ade-995d-1c6fcbcd1262",
"metadata": {},
"source": [
"# 一、Pytorch基本操作考察\n",
"## 题目2\n",
"**使用 𝐓𝐞𝐧𝐬𝐨𝐫 初始化一个 𝟏×𝟑 的矩阵 𝑴 和一个 𝟐×𝟏 的矩阵 𝑵,对两矩阵进行减法操作(要求实现三种不同的形式),给出结果并分析三种方式的不同(如果出现报错,分析报错的原因),同时需要指出在计算过程中发生了什么。**"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "79ea46db-cf49-436c-9b5b-c6562d0da9e2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"方法1的结果:\n",
"tensor([[-3, -2, -1],\n",
" [-4, -3, -2]])\n",
"方法2的结果:\n",
"tensor([[-3, -2, -1],\n",
" [-4, -3, -2]])\n",
"方法3的结果:\n",
"tensor([[-3, -2, -1],\n",
" [-4, -3, -2]])\n"
]
}
],
"source": [
"A = torch.tensor([[1, 2, 3]])\n",
"\n",
"B = torch.tensor([[4],\n",
" [5]])\n",
"\n",
"# 方法1: 使用PyTorch的减法操作符\n",
"result1 = A - B\n",
"\n",
"# 方法2: 使用PyTorch的sub函数\n",
"result2 = torch.sub(A, B)\n",
"\n",
"# 方法3: 手动实现广播机制并作差\n",
"def my_sub(a:torch.Tensor, b:torch.Tensor):\n",
" if not ((a.size(0) == 1 and b.size(1) == 1) or (a.size(1) == 1 and b.size(0) == 1)):\n",
" raise ValueError(\"输入的张量大小无法满足广播机制的条件。\")\n",
" else:\n",
" target_shape = torch.Size([max(A.size(0), B.size(0)), max(A.size(1), B.size(1))])\n",
" A_broadcasted = A.expand(target_shape)\n",
" B_broadcasted = B.expand(target_shape)\n",
" result = torch.zeros(target_shape, dtype=torch.int64).to(device=A_broadcasted.device)\n",
" for i in range(target_shape[0]):\n",
" for j in range(target_shape[1]):\n",
" result[i, j] = A_broadcasted[i, j] - B_broadcasted[i, j]\n",
" return result\n",
"\n",
"result3 = my_sub(A, B)\n",
"\n",
"print(\"方法1的结果:\")\n",
"print(result1)\n",
"print(\"方法2的结果:\")\n",
"print(result2)\n",
"print(\"方法3的结果:\")\n",
"print(result3)"
]
},
{
"cell_type": "markdown",
"id": "bd9bd5cc-b6da-4dd6-a599-76498bc5247d",
"metadata": {},
"source": [
"第1、2、3种减法形式实质是一样的。\n",
"\n",
"步骤如下:\n",
"1. 对A、B两个张量进行广播将A、B向广播的方向复制得到两个$\\max(A.size(0), B.size(0))\\times \\max(A.size(1), B.size(1))$的张量;\n",
"2. 对广播后的两个张量作差,尺寸不变。\n",
"\n",
"第1种减法形式和第2种是等价的前者是后者的符号化表示。\n",
"\n",
"第3种形式是手动实现的将上述两个步骤分别手动实现了。但是torch.Tensor还内置了其他机制这里仅模拟了广播和作差。"
]
},
{
"cell_type": "markdown",
"id": "2489a3ad-f6ff-4561-bb26-e02654090b98",
"metadata": {},
"source": [
"## 题目2\n",
"1. **利用Tensor创建两个大小分别$3\\times 2$和$4\\times 2$的随机数矩阵$P$和$Q$,要求服从均值为$0$,标准差$0.01$为的正态分布;**\n",
"2. **对第二步得到的矩阵$Q$进行形状变换得到$Q$的转置$Q^T$**\n",
"3. **对上述得到的矩阵$P$和矩阵$Q^T$求矩阵相乘。**"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "41e4ee02-1d05-4101-b3f0-477bac0277fb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"矩阵 P:\n",
"tensor([[ 0.0069, 0.0082],\n",
" [-0.0052, -0.0124],\n",
" [ 0.0055, -0.0014]])\n",
"矩阵 Q:\n",
"tensor([[ 0.0050, 0.0075],\n",
" [ 0.0161, 0.0070],\n",
" [-0.0009, -0.0014],\n",
" [-0.0003, 0.0024]])\n",
"矩阵 QT:\n",
"tensor([[ 0.0050, 0.0161, -0.0009, -0.0003],\n",
" [ 0.0075, 0.0070, -0.0014, 0.0024]])\n",
"矩阵相乘的结果:\n",
"tensor([[ 9.6016e-05, 1.6860e-04, -1.7451e-05, 1.8011e-05],\n",
" [-1.1894e-04, -1.7065e-04, 2.1900e-05, -2.8712e-05],\n",
" [ 1.6918e-05, 7.8455e-05, -2.7165e-06, -4.9904e-06]])\n"
]
}
],
"source": [
"mean = 0\n",
"stddev = 0.01\n",
"\n",
"P = torch.normal(mean=mean, std=stddev, size=(3, 2))\n",
"Q = torch.normal(mean=mean, std=stddev, size=(4, 2))\n",
"\n",
"print(\"矩阵 P:\")\n",
"print(P)\n",
"print(\"矩阵 Q:\")\n",
"print(Q)\n",
"\n",
"# 对矩阵Q进行转置操作得到矩阵Q的转置Q^T\n",
"QT = Q.T\n",
"print(\"矩阵 QT:\")\n",
"print(QT)\n",
"\n",
"# 计算矩阵P和矩阵Q^T的矩阵相乘\n",
"result = torch.matmul(P, QT)\n",
"print(\"矩阵相乘的结果:\")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"id": "cea9cb6d-adde-4e08-b9f2-8c417abf4231",
"metadata": {},
"source": [
"## 题目2\n",
"**给定公式$ y_3=y_1+y_2=𝑥^2+𝑥^3$,且$x=1$。利用学习所得到的Tensor的相关知识求$y_3$对$x$的梯度,即$\\frac{dy_3}{dx}$。**"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "951512cd-d915-4d04-959f-eb99d1971e2d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"仅通过y_1传递的梯度: 2.0\n",
"仅通过y_2传递的梯度: 3.0\n",
"dy_3/dx: 5.0\n"
]
}
],
"source": [
"x = torch.tensor(1.0, requires_grad=True)\n",
"\n",
"y_1 = x ** 2\n",
"with torch.no_grad():\n",
" y_2 = x ** 3\n",
"y_3 = y_1 + y_2\n",
"y_3.backward()\n",
"print(\"仅通过y_1传递的梯度: \", x.grad.item())\n",
"\n",
"x.grad.data.zero_()\n",
"with torch.no_grad():\n",
" y_1 = x ** 2\n",
"y_2 = x ** 3\n",
"y_3 = y_1 + y_2\n",
"y_3.backward()\n",
"print(\"仅通过y_2传递的梯度: \", x.grad.item())\n",
"\n",
"x.grad.data.zero_()\n",
"y_1 = x ** 2\n",
"y_2 = x ** 3\n",
"y_3 = y_1 + y_2\n",
"y_3.backward()\n",
"\n",
"print(\"dy_3/dx: \", x.grad.item())"
]
},
{
"cell_type": "markdown",
"id": "3269dbf6-889a-49eb-8094-1e588e1a6c30",
"metadata": {},
"source": [
"# 二、动手实现logistic回归\n",
"## 题目1\n",
"**要求动手从0实现 logistic 回归只借助Tensor和Numpy相关的库在人工构造的数据集上进行训练和测试并从loss以及训练集上的准确率等多个角度对结果进行分析可借助nn.BCELoss或nn.BCEWithLogitsLoss作为损失函数从零实现二元交叉熵为选作**"
]
},
{
"cell_type": "markdown",
"id": "bcd12aa9-f187-4d88-8c59-af6d16107edb",
"metadata": {},
"source": [
"给定预测概率$ \\left( \\hat{y} \\right) $和目标标签$ \\left( y \\right)$通常是0或1BCELoss的计算公式如下\n",
"$$\n",
" \\text{BCELoss}(\\hat{y}, y) = -\\frac{1}{N} \\sum_{i=1}^{N} \\left(y_i \\cdot \\log(\\hat{y}_i) + (1 - y_i) \\cdot \\log(1 - \\hat{y}_i)\\right) \n",
"$$\n",
"其中,$\\left( N \\right) $是样本数量,$\\left( \\hat{y}_i \\right) $表示模型的预测概率向量中的第$ \\left( i \\right) $个元素,$\\left( y_i \\right) $表示实际的目标标签中的第$ \\left( i \\right) $个元素。在二分类问题中,$\\left( y_i \\right) $通常是0或1。这个公式表示对所有样本的二分类交叉熵损失进行了求和并取平均。\n",
"\n",
"因此BCELoss的手动实现如下。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e31b86ec-4114-48dd-8d73-fe4e0686419a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"输入:\n",
"tensor([0.6900])\n",
"标签:\n",
"tensor([1.])\n",
"My_BCELoss损失值: 0.37110066413879395\n",
"nn.BCELoss损失值: 0.37110066413879395\n"
]
}
],
"source": [
"class My_BCELoss:\n",
" def __call__(self, prediction: torch.Tensor, target: torch.Tensor):\n",
" loss = -torch.mean(target * torch.log(prediction) + (1 - target) * torch.log(1 - prediction))\n",
" return loss\n",
"\n",
"\n",
"# 测试\n",
"prediction = torch.sigmoid(torch.tensor([0.8]))\n",
"target = torch.tensor([1.0])\n",
"print(f\"输入:\\n{prediction}\")\n",
"print(f\"标签:\\n{target}\")\n",
"\n",
"my_bce_loss = My_BCELoss()\n",
"my_loss = my_bce_loss(prediction, target)\n",
"print(\"My_BCELoss损失值:\", my_loss.item())\n",
"\n",
"nn_bce_loss = nn.BCELoss()\n",
"nn_loss = nn_bce_loss(prediction, target)\n",
"print(\"nn.BCELoss损失值:\", nn_loss.item())"
]
},
{
"cell_type": "markdown",
"id": "345b0300-8808-4c43-9bf9-05a7e6e1f5af",
"metadata": {},
"source": [
"Optimizer的实现较为简单。\n",
"\n",
"主要实现:\n",
"- 传入参数:`__init__()`\n",
"- 对传入的参数进行更新:`step()`\n",
"- 清空传入参数存储的梯度:`zero_grad()`\n",
"\n",
"但是有一点需要注意,就是需要将传进来的`params`参数转化为`list`类型。因为`nn.Module`的`parameters()`方法会以`<class 'generator'>`的类型返回模型的参数,但是该类型变量无法像`list`一样使用`for`循环遍历。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0297066c-9fc1-448d-bdcb-29a6f1519117",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x的初始值: 1.0\n",
"学习率: 0.1\n",
"y.backward()之后x的梯度: 2.0\n",
"optimizer_test.step()之后x的值: 0.800000011920929\n",
"optimizer_test.zero_grad()之后x的梯度: 0.0\n"
]
}
],
"source": [
"class My_Optimizer:\n",
" def __init__(self, params: list[torch.Tensor], lr: float):\n",
" self.params = list(params)\n",
" self.lr = lr\n",
"\n",
" def step(self):\n",
" for param in self.params:\n",
" param.data = param.data - self.lr * param.grad.data\n",
"\n",
" def zero_grad(self):\n",
" for param in self.params:\n",
" if param.grad is not None:\n",
" param.grad.data.zero_()\n",
"\n",
"\n",
"# 测试\n",
"x = torch.tensor(1.0, requires_grad=True)\n",
"print(\"x的初始值: \", x.item())\n",
"\n",
"optimizer_test = My_Optimizer([x], lr=0.1)\n",
"print(\"学习率: \", optimizer_test.lr)\n",
"\n",
"y = x ** 2\n",
"y.backward()\n",
"print(\"y.backward()之后x的梯度: \", x.grad.item())\n",
"\n",
"optimizer_test.step()\n",
"print(\"optimizer_test.step()之后x的值: \", x.item())\n",
"\n",
"optimizer_test.zero_grad()\n",
"print(\"optimizer_test.zero_grad()之后x的梯度: \", x.grad.item())"
]
},
{
"cell_type": "markdown",
"id": "6ab83528-a88b-4d66-b0c9-b1315cf75c22",
"metadata": {},
"source": [
"线性层主要有一个权重weight和一个偏置bias。\n",
"线性层的数学公式如下:\n",
"$$\n",
"x:=x \\times weight^T+bias\n",
"$$\n",
"因此代码实现如下:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "8e18695a-d8c5-4f77-8b5c-de40d9240fb9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"输入:\n",
"tensor([[1.],\n",
" [2.]], requires_grad=True)\n",
"权重:\n",
"tensor([[ 0.4240],\n",
" [-0.2577],\n",
" [ 0.4972]])\n",
"偏置:\n",
"tensor([0.6298, 0.6243, 0.8217])\n",
"My_Linear输出\n",
"tensor([[1.0539, 0.3666, 1.3189],\n",
" [1.4779, 0.1089, 1.8161]], grad_fn=<AddBackward0>)\n",
"nn.Linear输出\n",
"tensor([[1.0539, 0.3666, 1.3189],\n",
" [1.4779, 0.1089, 1.8161]], grad_fn=<AddmmBackward0>)\n"
]
}
],
"source": [
"class My_Linear:\n",
" def __init__(self, input_feature: int, output_feature: int):\n",
" self.weight = torch.randn((output_feature, input_feature), requires_grad=True, dtype=torch.float32)\n",
" self.bias = torch.randn(1, requires_grad=True, dtype=torch.float32)\n",
" self.params = [self.weight, self.bias]\n",
"\n",
" def __call__(self, x: torch.Tensor):\n",
" return self.forward(x)\n",
"\n",
" def forward(self, x: torch.Tensor):\n",
" x = torch.matmul(x, self.weight.T) + self.bias\n",
" return x\n",
"\n",
" def to(self, device: str):\n",
" for param in self.params:\n",
" param.data = param.data.to(device=device)\n",
" return self\n",
"\n",
" def parameters(self):\n",
" return self.params\n",
"\n",
" \n",
"# 测试\n",
"my_linear = My_Linear(1, 3)\n",
"nn_linear = nn.Linear(1, 3)\n",
"my_linear.weight = nn_linear.weight.clone().requires_grad_()\n",
"my_linear.bias = nn_linear.bias.clone().requires_grad_()\n",
"x = torch.tensor([[1.], [2.]], requires_grad=True)\n",
"print(f\"输入:\\n{x}\")\n",
"print(f\"权重:\\n{my_linear.weight.data}\")\n",
"print(f\"偏置:\\n{my_linear.bias.data}\")\n",
"y_my_linear = my_linear(x)\n",
"print(f\"My_Linear输出\\n{y_my_linear}\")\n",
"y_nn_linear = nn_linear(x)\n",
"print(f\"nn.Linear输出\\n{y_nn_linear}\")"
]
},
{
"cell_type": "markdown",
"id": "5ff813cc-c1f0-4c73-a3e8-d6796ef5d366",
"metadata": {},
"source": [
"手动实现logistic回归模型。\n",
"\n",
"模型很简单主要由一个线性层和一个sigmoid层组成。\n",
"\n",
"Sigmoid函数又称为 Logistic函数是一种常用的激活函数通常用于神经网络的输出层或隐藏层其作用是将输入的实数值压缩到一个范围在0和1之间的数值。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e7de7e4b-a084-4793-812e-46e8550ecd8d",
"metadata": {},
"outputs": [],
"source": [
"class Model_2_1():\n",
" def __init__(self):\n",
" self.linear = My_Linear(1, 1)\n",
" self.params = self.linear.params\n",
"\n",
" def __call__(self, x):\n",
" return self.forward(x)\n",
"\n",
" def forward(self, x):\n",
" x = self.linear(x)\n",
" x = torch.sigmoid(x)\n",
" return x\n",
"\n",
" def to(self, device: str):\n",
" for param in self.params:\n",
" param.data = param.data.to(device=device)\n",
" return self\n",
"\n",
" def parameters(self):\n",
" return self.params"
]
},
{
"cell_type": "markdown",
"id": "e14acea9-e5ef-4c24-aea9-329647224ce1",
"metadata": {},
"source": [
"人工随机构造数据集。\n",
"\n",
"这里我遇到了比较大的问题。因为数据构建不合适,会导致后面的训练出现梯度爆炸。\n",
"\n",
"我采用随机产生数据后归一化的方法,即\n",
"$$\n",
"\\hat{x} = \\frac{x - \\text{min}_x}{\\text{max}_x - \\text{min}_x} \n",
"$$\n",
"将数据控制在合适的区间。\n",
"\n",
"我的y设置为$4-3\\times x + noise$noise为随机噪声。\n",
"\n",
"生成完x和y后进行归一化处理并写好DataLoader访问数据集的接口`__getitem__()`。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c39fbafb-62e4-4b8c-9d65-6718d25f2970",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"测试数据集大小1000000\n",
"测试数据集第0对数据\n",
"x_0 = 0.5488133381316141\n",
"y_0 = 0.45217091576438073\n"
]
}
],
"source": [
"class My_Dataset(Dataset):\n",
" def __init__(self, data_size=1000000):\n",
" np.random.seed(0)\n",
" x = 2 * np.random.rand(data_size, 1)\n",
" noise = 0.2 * np.random.randn(data_size, 1)\n",
" y = 4 - 3 * x + noise\n",
" self.min_x, self.max_x = np.min(x), np.max(x)\n",
" min_y, max_y = np.min(y), np.max(y)\n",
" x = (x - self.min_x) / (self.max_x - self.min_x)\n",
" y = (y - min_y) / (max_y - min_y)\n",
" self.data = [[x[i][0], y[i][0]] for i in range(x.shape[0])]\n",
"\n",
" def __len__(self):\n",
" return len(self.data)\n",
"\n",
" def __getitem__(self, index):\n",
" x, y = self.data[index]\n",
" return x, y\n",
"\n",
"\n",
"# 测试,并后面的训练创建变量\n",
"dataset = My_Dataset()\n",
"dataset_size = len(dataset)\n",
"print(f\"测试数据集大小:{dataset_size}\")\n",
"x0, y0 = dataset[0]\n",
"print(f\"测试数据集第0对数据\")\n",
"print(f\"x_0 = {x0}\")\n",
"print(f\"y_0 = {y0}\")"
]
},
{
"cell_type": "markdown",
"id": "957a76a2-b306-47a8-912e-8fbf00cdfd42",
"metadata": {},
"source": [
"训练Logistic回归模型。\n",
"进行如下步骤:\n",
"1. 初始化超参数\n",
"2. 获取数据集\n",
"3. 初始化模型\n",
"4. 定义损失函数和优化器\n",
"5. 训练\n",
" 1. 从训练dataloader中获取批量数据\n",
" 2. 传入模型\n",
" 3. 使用损失函数计算与ground_truth的损失\n",
" 4. 使用优化器进行反向传播\n",
" 5. 循环以上步骤\n",
"6. 测试\n",
" 1. 设置测试数据\n",
" 2. 传入模型\n",
" 3. 得到预测值"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "5612661e-2809-4d46-96c2-33ee9f44116d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10, Loss: 688.6783249974251, Acc: 0.9766838179955138\n",
"Epoch 2/10, Loss: 679.506599009037, Acc: 0.992039453911494\n",
"Epoch 3/10, Loss: 677.644762635231, Acc: 0.9961844975781526\n",
"Epoch 4/10, Loss: 677.2690716981888, Acc: 0.998395304269398\n",
"Epoch 5/10, Loss: 677.1928514242172, Acc: 0.9993592246184307\n",
"Epoch 6/10, Loss: 677.1781670451164, Acc: 0.9996570376204033\n",
"Epoch 7/10, Loss: 677.1744618415833, Acc: 0.9998465339227576\n",
"Epoch 8/10, Loss: 677.1738814711571, Acc: 0.9998001679325041\n",
"Epoch 9/10, Loss: 677.1742851734161, Acc: 0.9998804348705138\n",
"Epoch 10/10, Loss: 677.1740592718124, Acc: 0.9999446971149187\n",
"Model weights: -0.0037125118542462587, bias: 0.017451055347919464\n",
"Prediction for test data: 0.5034345984458923\n"
]
}
],
"source": [
"learning_rate = 5e-2\n",
"num_epochs = 10\n",
"batch_size = 1024\n",
"device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
"\n",
"dataloader = DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True, num_workers=14, pin_memory=True)\n",
"\n",
"model = Model_2_1().to(device)\n",
"criterion = My_BCELoss()\n",
"optimizer = My_Optimizer(model.parameters(), lr=learning_rate)\n",
"\n",
"for epoch in range(num_epochs):\n",
" total_epoch_loss = 0\n",
" total_epoch_pred = 0\n",
" total_epoch_target = 0\n",
" for x, targets in dataloader:\n",
" optimizer.zero_grad()\n",
" \n",
" x = x.to(device).to(dtype=torch.float32)\n",
" targets = targets.to(device).to(dtype=torch.float32)\n",
" \n",
" x = x.unsqueeze(1)\n",
" y_pred = model(x)\n",
" loss = criterion(y_pred, targets)\n",
" total_epoch_loss += loss.item()\n",
" total_epoch_target += targets.sum().item()\n",
" total_epoch_pred += y_pred.sum().item()\n",
"\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" print(f\"Epoch {epoch + 1}/{num_epochs}, Loss: {total_epoch_loss}, \", end=\"\")\n",
" print(f\"Acc: {1 - abs(total_epoch_pred - total_epoch_target) / total_epoch_target}\")\n",
"\n",
"with torch.no_grad():\n",
" test_data = (np.array([[2]]) - dataset.min_x) / (dataset.max_x - dataset.min_x)\n",
" test_data = Variable(torch.tensor(test_data, dtype=torch.float32), requires_grad=False).to(device)\n",
" predicted = model(test_data).to(\"cpu\")\n",
" print(f\"Model weights: {model.linear.weight.item()}, bias: {model.linear.bias.item()}\")\n",
" print(f\"Prediction for test data: {predicted.item()}\")"
]
},
{
"cell_type": "markdown",
"id": "9e416582-a30d-4084-acc6-6e05f80a6aff",
"metadata": {},
"source": [
"## 题目2\n",
"**利用 torch.nn 实现 logistic 回归在人工构造的数据集上进行训练和测试并对结果进行分析并从loss以及训练集上的准确率等多个角度对结果进行分析**"
]
},
{
"cell_type": "markdown",
"id": "0460d125-7d03-44fe-845c-c4d13792e241",
"metadata": {},
"source": [
"使用torch.nn实现模型。\n",
"\n",
"将之前的Model_2_1中的手动实现函数改为torch.nn内置函数即可再加上继承nn.Module以使用torch.nn内置模型模板特性。"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "fa121afd-a1af-4193-9b54-68041e0ed068",
"metadata": {},
"outputs": [],
"source": [
"class Model_2_2(nn.Module):\n",
" def __init__(self):\n",
" super(Model_2_2, self).__init__()\n",
" self.linear = nn.Linear(1, 1, dtype=torch.float64)\n",
"\n",
" def forward(self, x):\n",
" x = self.linear(x)\n",
" x = torch.sigmoid(x)\n",
" return x"
]
},
{
"cell_type": "markdown",
"id": "176eee7e-4e3d-470e-8af2-8761bca039f8",
"metadata": {},
"source": [
"训练与测试过程与之前手动实现的几乎一致。仅有少量涉及数据类型dtype的代码需要更改以适应torch.nn的内置函数要求。"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "93b0fdb6-be8b-4663-b59e-05ed19a9ea09",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10, Loss: 660.2008021697803, Acc: 0.9355364605682331\n",
"Epoch 2/10, Loss: 589.2025169091534, Acc: 0.9769773185253259\n",
"Epoch 3/10, Loss: 572.7106042209589, Acc: 0.9881629137259633\n",
"Epoch 4/10, Loss: 568.0903503441508, Acc: 0.9935173218188225\n",
"Epoch 5/10, Loss: 566.6528526848851, Acc: 0.9962586560919562\n",
"Epoch 6/10, Loss: 566.1778871576632, Acc: 0.9978209774304773\n",
"Epoch 7/10, Loss: 566.0143385848835, Acc: 0.9987369762885633\n",
"Epoch 8/10, Loss: 565.9605239629793, Acc: 0.9992563563084009\n",
"Epoch 9/10, Loss: 565.9402079010808, Acc: 0.9995321069396558\n",
"Epoch 10/10, Loss: 565.9281422200424, Acc: 0.9997496312356398\n",
"Model weights: -3.6833968323036084, bias: 1.8628376037952126\n",
"Prediction for test data: 0.13936666014014443\n"
]
}
],
"source": [
"learning_rate = 5e-2\n",
"num_epochs = 10\n",
"batch_size = 1024\n",
"device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
"\n",
"dataloader = DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True, num_workers=14, pin_memory=True)\n",
"\n",
"model = Model_2_2().to(device)\n",
"criterion = nn.BCELoss()\n",
"optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)\n",
"\n",
"for epoch in range(num_epochs):\n",
" total_epoch_loss = 0\n",
" total_epoch_pred = 0\n",
" total_epoch_target = 0\n",
" for x, targets in dataloader:\n",
" optimizer.zero_grad()\n",
"\n",
" x = x.to(device)\n",
" targets = targets.to(device)\n",
"\n",
" x = x.unsqueeze(1)\n",
" targets = targets.unsqueeze(1)\n",
" y_pred = model(x)\n",
" loss = criterion(y_pred, targets)\n",
" total_epoch_loss += loss.item()\n",
" total_epoch_target += targets.sum().item()\n",
" total_epoch_pred += y_pred.sum().item()\n",
"\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" print(f\"Epoch {epoch + 1}/{num_epochs}, Loss: {total_epoch_loss}, \", end=\"\")\n",
" print(f\"Acc: {1 - abs(total_epoch_pred - total_epoch_target) / total_epoch_target}\")\n",
"\n",
"with torch.no_grad():\n",
" test_data = (np.array([[2]]) - dataset.min_x) / (dataset.max_x - dataset.min_x)\n",
" test_data = Variable(torch.tensor(test_data, dtype=torch.float64), requires_grad=False).to(device)\n",
" predicted = model(test_data).to(\"cpu\")\n",
" print(f\"Model weights: {model.linear.weight.item()}, bias: {model.linear.bias.item()}\")\n",
" print(f\"Prediction for test data: {predicted.item()}\")"
]
},
{
"cell_type": "markdown",
"id": "e6bff679-f8d2-46cc-bdcb-82af7dab38b3",
"metadata": {},
"source": [
"对比发现手动实现的损失函数和优化器与torch.nn的内置损失函数和优化器相比表现差不多。\n",
"\n",
"但是为什么相同分布的数据集训练出的权重和偏置,以及预测结果存在较大差别,这个问题的原因还有待我探究。"
]
},
{
"cell_type": "markdown",
"id": "ef41d7fa-c2bf-4024-833b-60af0a87043a",
"metadata": {},
"source": [
"# 三、动手实现softmax回归\n",
"\n",
"## 问题1\n",
"\n",
"**要求动手从0实现softmax回归只借助Tensor和Numpy相关的库在Fashion-MNIST数据集上进行训练和测试并从loss、训练集以及测试集上的准确率等多个角度对结果进行分析要求从零实现交叉熵损失函数**"
]
},
{
"cell_type": "markdown",
"id": "3c356760-75a8-4814-ba69-73b270396a4e",
"metadata": {},
"source": [
"手动实现nn.one_hot()。\n",
"\n",
"one-hot向量用于消除线性标签值所映射的类别的非线性。\n",
"\n",
"one-hot向量是使用一个长度为分类数量的数组表示标签值其中有且仅有1个值为为1该值的下标为标签值其余为0。\n",
"\n",
"原理很简单,步骤如下:\n",
"1. 初始化全零的张量,大小为(标签数量,分类数量);\n",
"2. 将标签值映射到全零张量的\\[下标,标签值\\]中将该位置为1\n",
"3. 返回修改后的张量即是ont-hot向量。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "e605f1b0-1d32-410f-bddf-402a85ccc9ff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"输入:\n",
"tensor([2, 1, 0])\n",
"my_one_hot输出\n",
"tensor([[0, 0, 1, 0, 0],\n",
" [0, 1, 0, 0, 0],\n",
" [1, 0, 0, 0, 0]])\n",
"nn.functional.one_hot输出\n",
"tensor([[0, 0, 1, 0, 0],\n",
" [0, 1, 0, 0, 0],\n",
" [1, 0, 0, 0, 0]])\n"
]
}
],
"source": [
"def my_one_hot(indices: torch.Tensor, num_classes: int):\n",
" one_hot_tensor = torch.zeros(len(indices), num_classes).to(indices.device).to(dtype=torch.int64)\n",
" one_hot_tensor.scatter_(1, indices.view(-1, 1), 1)\n",
" return one_hot_tensor\n",
"\n",
"\n",
"# 测试\n",
"x = torch.tensor([2, 1, 0], dtype=torch.int64)\n",
"print(f\"输入:\\n{x}\")\n",
"\n",
"x_my_onehot = my_one_hot(x, 5)\n",
"print(f\"my_one_hot输出\\n{x_my_onehot}\")\n",
"\n",
"x_nn_F_onehot = nn.functional.one_hot(x, 5)\n",
"print(f\"nn.functional.one_hot输出\\n{x_nn_F_onehot}\")"
]
},
{
"cell_type": "markdown",
"id": "902603a6-bfb9-4ce3-bd0d-b00cebb1d3cb",
"metadata": {},
"source": [
"手动实现CrossEntropyLoss。\n",
"\n",
"CrossEntropyLoss由一个log_softmax和一个nll_loss组成。\n",
"\n",
"softmax的数学表达式如下\n",
"$$\n",
"\\text{softmax}(y_i) = \\frac{e^{y_i - \\text{max}(y)}}{\\sum_{j=1}^{N} e^{y_j - \\text{max}(y)}} \n",
"$$\n",
"log_softmax即为$\\log\\left(softmax\\left(y\\right)\\right)$。\n",
"\n",
"CrossEntropyLoss的数学表达式如下\n",
"$$\n",
"\\text{CrossEntropyLoss}(y, \\hat{y}) = -\\frac{1}{N} \\sum_{i=1}^{N} \\hat{y}_i \\cdot \\log(\\text{softmax}(y_i)) \n",
"$$\n",
"\n",
"故代码如下:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "759a3bb2-b5f4-4ea5-a2d7-15f0c4cdd14b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"输入:\n",
"tensor([[ 0.7600, 0.4269, 0.7948, -0.6086, 1.2527],\n",
" [-0.4749, 0.5720, -0.0164, -0.2126, -0.0410],\n",
" [ 1.3269, 1.8524, -0.9815, 0.0156, 1.6971]], requires_grad=True)\n",
"标签:\n",
"tensor([[0., 1., 0., 0., 0.],\n",
" [0., 0., 0., 1., 0.],\n",
" [1., 0., 0., 0., 0.]])\n",
"My_CrossEntropyLoss损失值: 1.7417106628417969\n",
"nn.CrossEntropyLoss损失值: 1.7417105436325073\n"
]
}
],
"source": [
"class My_CrossEntropyLoss:\n",
" def __call__(self, predictions: torch.Tensor, targets: torch.Tensor):\n",
" max_values = torch.max(predictions, dim=1, keepdim=True).values\n",
" exp_values = torch.exp(predictions - max_values)\n",
" softmax_output = exp_values / torch.sum(exp_values, dim=1, keepdim=True)\n",
" log_probs = torch.log(softmax_output)\n",
" \n",
" nll_loss = -torch.sum(targets * log_probs, dim=1)\n",
" average_loss = torch.mean(nll_loss)\n",
" return average_loss\n",
"\n",
" \n",
"# 测试\n",
"input = torch.randn(3, 5, requires_grad=True)\n",
"target = torch.randn(3, 5).softmax(dim=1).argmax(1)\n",
"target = torch.nn.functional.one_hot(target, num_classes=5).to(dtype=torch.float32)\n",
"print(f\"输入:\\n{input}\")\n",
"print(f\"标签:\\n{target}\")\n",
"\n",
"my_crossentropyloss = My_CrossEntropyLoss()\n",
"my_loss = my_crossentropyloss(input, target)\n",
"print(\"My_CrossEntropyLoss损失值:\", my_loss.item())\n",
"\n",
"nn_crossentropyloss = nn.CrossEntropyLoss()\n",
"nn_loss = nn_crossentropyloss(input, target)\n",
"print(\"nn.CrossEntropyLoss损失值:\", nn_loss.item())"
]
},
{
"cell_type": "markdown",
"id": "dbf78501-f5be-4008-986c-d331d531491f",
"metadata": {},
"source": [
"手动实现Flatten。\n",
"\n",
"原理很简单,就是把多维的张量拉直成一个向量。"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "74322629-8325-4823-b80f-f28182d577c1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Flatten之前的x\n",
"tensor([[[1., 2.],\n",
" [3., 4.]],\n",
"\n",
" [[5., 6.],\n",
" [7., 8.]]])\n",
"My_Flatten之后的x\n",
"tensor([[1., 2., 3., 4.],\n",
" [5., 6., 7., 8.]])\n",
"nn.Flatten之后的x\n",
"tensor([[1., 2., 3., 4.],\n",
" [5., 6., 7., 8.]])\n"
]
}
],
"source": [
"class My_Flatten:\n",
" def __call__(self, x: torch.Tensor):\n",
" return self.forward(x)\n",
"\n",
" def forward(self, x: torch.Tensor):\n",
" x = x.view(x.shape[0], -1)\n",
" return x\n",
"\n",
"\n",
"# 测试\n",
"my_flatten = My_Flatten()\n",
"nn_flatten = nn.Flatten()\n",
"x = torch.tensor([[[1., 2.],\n",
" [3., 4.]],\n",
" [[5., 6.],\n",
" [7., 8.]]])\n",
"print(f\"Flatten之前的x\\n{x}\")\n",
"x_my_flatten = my_flatten(x)\n",
"print(f\"My_Flatten之后的x\\n{x_my_flatten}\")\n",
"x_nn_flatten = nn_flatten(x)\n",
"print(f\"nn.Flatten之后的x\\n{x_nn_flatten}\")"
]
},
{
"cell_type": "markdown",
"id": "35aee905-ae37-4faa-a7f1-a04cd8579f78",
"metadata": {},
"source": [
"手动实现softmax回归模型。\n",
"\n",
"模型很简单主要由一个Flatten层和一个线性层组成。\n",
"\n",
"Flatten层主要用于将2维的图像展开直接作为1维的特征量输入网络。"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "bb31a75e-464c-4b94-b927-b219a765e35d",
"metadata": {},
"outputs": [],
"source": [
"class Model_3_1:\n",
" def __init__(self, num_classes):\n",
" self.flatten = My_Flatten()\n",
" self.linear = My_Linear(28 * 28, num_classes)\n",
" self.params = self.linear.params\n",
"\n",
" def __call__(self, x: torch.Tensor):\n",
" return self.forward(x)\n",
"\n",
" def forward(self, x: torch.Tensor):\n",
" x = self.flatten(x)\n",
" x = self.linear(x)\n",
" return x\n",
"\n",
" def to(self, device: str):\n",
" for param in self.params:\n",
" param.data = param.data.to(device=device)\n",
" return self\n",
"\n",
" def parameters(self):\n",
" return self.params"
]
},
{
"cell_type": "markdown",
"id": "17e686d1-9c9a-4727-8fdc-9990d348c523",
"metadata": {},
"source": [
"训练与测试过程与之前手动实现的几乎一致。由于数据集的变化,对应超参数也进行了调整。\n",
"\n",
"数据集也使用了现成的FashionMNIST数据集且划分了训练集和测试集。\n",
"\n",
"FashionMNIST数据集直接调用API获取。数据集的image为28*28的单通道灰白图片label为单个数值标签。\n",
"\n",
"训练softmax回归模型。\n",
"进行如下步骤:\n",
"1. 初始化超参数\n",
"2. 获取数据集\n",
"3. 初始化模型\n",
"4. 定义损失函数和优化器\n",
"5. 训练\n",
" 1. 从训练dataloader中获取批量数据\n",
" 2. 传入模型\n",
" 3. 使用损失函数计算与ground_truth的损失\n",
" 4. 使用优化器进行反向传播\n",
" 5. 循环以上步骤\n",
"6. 测试\n",
" 1. 从测试dataloader中获取批量数据\n",
" 2. 传入模型\n",
" 3. 将预测值与ground_truth进行比较得出正确率\n",
" 4. 对整个训练集统计正确率,从而分析训练效果"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "d816dae1-5fbe-4c29-9597-19d66b5eb6b4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10, Loss: 87.64246368408203, Acc: 0.45329999923706055\n",
"Epoch 2/10, Loss: 42.025726318359375, Acc: 0.5523999929428101\n",
"Epoch 3/10, Loss: 34.06425094604492, Acc: 0.5947999954223633\n",
"Epoch 4/10, Loss: 30.135021209716797, Acc: 0.620199978351593\n",
"Epoch 5/10, Loss: 27.43822479248047, Acc: 0.6401000022888184\n",
"Epoch 6/10, Loss: 25.72039031982422, Acc: 0.6525999903678894\n",
"Epoch 7/10, Loss: 24.28335952758789, Acc: 0.6638999581336975\n",
"Epoch 8/10, Loss: 23.18214988708496, Acc: 0.671999990940094\n",
"Epoch 9/10, Loss: 22.18520164489746, Acc: 0.680899977684021\n",
"Epoch 10/10, Loss: 21.393451690673828, Acc: 0.6875999569892883\n"
]
}
],
"source": [
"learning_rate = 5e-1\n",
"num_epochs = 10\n",
"batch_size = 4096\n",
"num_classes = 10\n",
"device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
"\n",
"transform = transforms.Compose(\n",
" [\n",
" transforms.ToTensor(),\n",
" transforms.Normalize((0.5,), (1.0,)),\n",
" ]\n",
")\n",
"train_dataset = datasets.FashionMNIST(root=\"./dataset\", train=True, transform=transform, download=True)\n",
"test_dataset = datasets.FashionMNIST(root=\"./dataset\", train=False, transform=transform, download=True)\n",
"train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size,shuffle=True, num_workers=14, pin_memory=True)\n",
"test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size,shuffle=True, num_workers=14, pin_memory=True)\n",
"\n",
"model = Model_3_1(num_classes).to(device)\n",
"criterion = My_CrossEntropyLoss()\n",
"optimizer = My_Optimizer(model.parameters(), lr=learning_rate)\n",
"\n",
"for epoch in range(num_epochs):\n",
" total_epoch_loss = 0\n",
" for images, targets in train_loader:\n",
" optimizer.zero_grad()\n",
"\n",
" images = images.to(device)\n",
" targets = targets.to(device).to(dtype=torch.long)\n",
"\n",
" one_hot_targets = my_one_hot(targets, num_classes=num_classes).to(device).to(dtype=torch.long)\n",
"\n",
" outputs = model(images)\n",
" loss = criterion(outputs, one_hot_targets)\n",
" total_epoch_loss += loss\n",
"\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" total_acc = 0\n",
" with torch.no_grad():\n",
" for image, targets in test_loader:\n",
" image = image.to(device)\n",
" targets = targets.to(device)\n",
" outputs = model(image)\n",
" total_acc += (outputs.argmax(1) == targets).sum()\n",
" print(f\"Epoch {epoch + 1}/{num_epochs}, Loss: {total_epoch_loss}, Acc: {total_acc / len(test_dataset)}\")"
]
},
{
"cell_type": "markdown",
"id": "a49d0165-aeb7-48c0-9b67-956bb08cb356",
"metadata": {},
"source": [
"在这里我遇到了梯度爆炸的问题。\n",
"\n",
"原来我在数据预处理中使用`transforms.Normalize((0.5,), (0.5,))`进行归一化,但是这样导致了梯度爆炸。\n",
"\n",
"将第二个参数方差改为1.0后,成功解决了梯度爆炸的问题。"
]
},
{
"cell_type": "markdown",
"id": "3ef5240f-8a11-4678-bfce-f1cbc7e71b77",
"metadata": {},
"source": [
"## 问题2\n",
"\n",
"**利用torch.nn实现softmax回归在Fashion-MNIST数据集上进行训练和测试并从loss训练集以及测试集上的准确率等多个角度对结果进行分析**"
]
},
{
"cell_type": "markdown",
"id": "5c4a88c6-637e-4af5-bed5-f644685dcabc",
"metadata": {},
"source": [
"使用torch.nn实现模型。\n",
"\n",
"将之前的Model_3_1中的手动实现函数改为torch.nn内置函数即可再加上继承nn.Module以使用torch.nn内置模型模板特性。"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "0163b9f7-1019-429c-8c29-06436d0a4c98",
"metadata": {},
"outputs": [],
"source": [
"class Model_3_2(nn.Module):\n",
" def __init__(self, num_classes):\n",
" super(Model_3_2, self).__init__()\n",
" self.flatten = nn.Flatten()\n",
" self.linear = nn.Linear(28 * 28, num_classes)\n",
"\n",
" def forward(self, x: torch.Tensor):\n",
" x = self.flatten(x)\n",
" x = self.linear(x)\n",
" return x"
]
},
{
"cell_type": "markdown",
"id": "6e765ad7-c1c6-4166-bd7f-361666bd4016",
"metadata": {},
"source": [
"训练与测试过程与之前手动实现的几乎一致。"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "6d241c05-b153-4f56-a845-0f2362f6459b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10, Loss: 19.15451431274414, Acc: 0.7202000021934509\n",
"Epoch 2/10, Loss: 12.260371208190918, Acc: 0.7486000061035156\n",
"Epoch 3/10, Loss: 10.835549354553223, Acc: 0.7615999579429626\n",
"Epoch 4/10, Loss: 10.09542179107666, Acc: 0.7701999545097351\n",
"Epoch 5/10, Loss: 9.626176834106445, Acc: 0.777899980545044\n",
"Epoch 6/10, Loss: 9.264442443847656, Acc: 0.7854999899864197\n",
"Epoch 7/10, Loss: 9.017412185668945, Acc: 0.7879999876022339\n",
"Epoch 8/10, Loss: 8.786051750183105, Acc: 0.7915999889373779\n",
"Epoch 9/10, Loss: 8.613431930541992, Acc: 0.79749995470047\n",
"Epoch 10/10, Loss: 8.462657928466797, Acc: 0.7996999621391296\n"
]
}
],
"source": [
"learning_rate = 5e-2\n",
"num_epochs = 10\n",
"batch_size = 4096\n",
"num_classes = 10\n",
"device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
"\n",
"transform = transforms.Compose(\n",
" [\n",
" transforms.ToTensor(),\n",
" transforms.Normalize((0.5,), (0.5,)),\n",
" ]\n",
")\n",
"train_dataset = datasets.FashionMNIST(root=\"./dataset\", train=True, transform=transform, download=True)\n",
"test_dataset = datasets.FashionMNIST(root=\"./dataset\", train=False, transform=transform, download=True)\n",
"train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True, num_workers=14, pin_memory=True)\n",
"test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True, num_workers=14, pin_memory=True)\n",
"\n",
"model = Model_3_2(num_classes).to(device)\n",
"criterion = nn.CrossEntropyLoss()\n",
"optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)\n",
"\n",
"for epoch in range(num_epochs):\n",
" total_epoch_loss = 0\n",
" model.train()\n",
" for images, targets in train_loader:\n",
" optimizer.zero_grad()\n",
"\n",
" images = images.to(device)\n",
" targets = targets.to(device)\n",
"\n",
" one_hot_targets = nn.functional.one_hot(targets, num_classes=num_classes).to(device).to(dtype=torch.float32)\n",
"\n",
" outputs = model(images)\n",
" loss = criterion(outputs, one_hot_targets)\n",
" total_epoch_loss += loss\n",
"\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" model.eval()\n",
" total_acc = 0\n",
" with torch.no_grad():\n",
" for image, targets in test_loader:\n",
" image = image.to(device)\n",
" targets = targets.to(device)\n",
" outputs = model(image)\n",
" total_acc += (outputs.argmax(1) == targets).sum()\n",
" print(f\"Epoch {epoch + 1}/{num_epochs}, Loss: {total_epoch_loss}, Acc: {total_acc / len(test_dataset)}\")"
]
},
{
"cell_type": "markdown",
"id": "59555b67-1650-4e1a-a98e-7906878bf3d0",
"metadata": {},
"source": [
"与手动实现的softmax回归相比较nn.CrossEntropyLoss比手动实现的My_CrossEntropyLoss更加稳定对输入数据的兼容性更强没有出现梯度爆炸的情况。\n",
"\n",
"总体表现上torch.nn的内置功能相对手动实现的功能正确率提升更快最终正确率更高。"
]
},
{
"cell_type": "markdown",
"id": "f40431f2-e77b-4ead-81a3-ff6451a8e452",
"metadata": {},
"source": [
"# 实验心得体会\n",
"\n",
"通过完成本次Pytorch基本操作实验让我对Pytorch框架有了更加深入的理解。我接触深度学习主要是在大语言模型领域比较熟悉微调大模型但是涉及到底层的深度学习知识我还有很多短板和不足。这次实验对我这方面的锻炼让我收获良多。\n",
"\n",
"首先是数据集的设置。如果数据没有合理进行归一化,很容易出现梯度爆炸。这是在我以前直接使用图片数据集的经历中没有遇到过的问题。\n",
"\n",
"在实现logistic回归模型时通过手动实现各个组件如优化器、线性层等让我对这些模块的工作原理有了更清晰的认识。尤其是在实现广播机制时需要充分理解张量操作的维度变换规律。而使用Pytorch内置模块进行实现时通过继承nn.Module可以自动获得许多功能使代码更加简洁。\n",
"\n",
"在实现softmax回归时则遇到了更大的困难。手动实现的模型很容易出现梯度爆炸的问题而使用Pytorch内置的损失函数和优化器则可以稳定训练。这让我意识到了选择合适的优化方法的重要性。另外Pytorch强大的自动微分机制也是构建深度神经网络的重要基础。\n",
"\n",
"通过这个实验让我对Pytorch框架有了更加直观的感受也让我看到了仅靠基础模块搭建复杂模型的难点所在。这些经验对我后续使用Pytorch构建数据集模型会很有帮助。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}