终于碰上torch.nn.Embedding

it2023-05-15 130

转载：https://my.oschina.net/earnp/blog/1113896 https://zhuanlan.zhihu.com/p/69898755?from_voters_page=true

词向量实现(word embedding)是通过nn.Embedding函数来实现。

一个保存了固定字典和大小的简单查找表。这个模块常用来保存词嵌入和用下标检索它们。模块的输入是一个下标的列表，输出是对应的词嵌入。

import numpy as np import torch import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable word2idx = {'hello':0, 'world':1, 'goodbye':2} embeds = nn.Embedding(3,5) hello_idx = torch.LongTensor([word2idx['hello']]) hello_idx = Variable(hello_idx) hello_embed = embeds(hello_idx) print(hello_embed) tensor([[ 0.4258, 1.6828, -0.8221, -1.5680, -0.6643]], grad_fn=<EmbeddingBackward>)

解释： 1）nn.Embedding(3, 5)：表示有3个词，每个词用维度为5的向量表示。 2）每个单词我们需要用一个数字去表示他，这样我们需要hello的时候，就用0来表示它。注意: 这里的词向量的建立只是初始的词向量，并没有经过任何学习优化。即正态分布N(0,1)的随机赋值。办法： 1）我们需要建立神经网络通过训练来调整word embedding里面的参数使得word embedding每一个词向量能够表示每一个不同的词。 2）可以加载训练好的词向量，比如glove, word2vec。可以节省训练时间，并可能取得更好的训练结果。

细看

torch.nn.Embedding( num_embeddings, #字典中词的个数 embedding_dim, #embedding的维度 padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)

1）padding_idx (int, optional) - 如果提供的话，输出遇到此下标时用零填充

2）max_norm (float, optional) - 如果提供的话，会重新归一化词嵌入，使它们的范数小于提供的值

3）norm_type (float, optional) - 对于max_norm选项计算p范数时的p

4）scale_grad_by_freq (boolean, optional) - 如果提供的话，会根据字典中单词频率缩放梯度

5）weight weight (Tensor) -形状为 (num_embeddings, embedding_dim)的模块中可学习的权值

这是网络层，所以有输入输出： 6）输入： LongTensor (N, W)，必须， N = mini-batch；W = 每个mini-batch中提取的下标数（序列长度） 7）输出： (N, W, embedding_dim)

加载预训练词向量

import torch #创建一个词向量矩阵 word_embeddings = torch.nn.Embedding(vocab_size,embedding_dim) #np_path是一个存储预训练词向量的文件路径 pretrain_embedding = np.array(np.load(np_path),dtype = 'float32') #思路是将np.ndarray形式的词向量转换为pytorch的tensor，再复制到原来创建的词向量矩阵中 word_embeddings.weight.data.copy_(torch.from_numpy(pretrain_embedding)) #方法二 #word_embeddings.weight = nn.Parameter(torch.FloatTensor(pretrain_embedding))

为每个句子建立索引结构，list[[sentence1],[sentence2]]。以字典的索引来说，最终建立的就是[[1,2,3],[1,4,5,6]]。这样长短不一的句子。

接下来要进行padding的操作。由于tensor结构中都是等长的，所以要对上面那样的句子做padding操作后再利用nn.Embedding来进行词的初始化。padding后的可能是这样的结构[[1,2,3,0],[1,4,5,6]]。其中0作为填充。

调整维度（batch，seq_len, embedding_size）——>（seq_len，batch, embedding_size）

batch有3个样例，RNN的每一步要输入每个样例的一个单词，一次输入batch_size个样例，所以batch要按list外层是时间步数(即序列长度)，list内层是batch_size排列。

使用itertools模块的zip_longest函数。而且，使用这个函数，连填充这一步都可以省略，

batch = list(itertools.zip_longest(batch,fillvalue=PAD)) #fillvalue就是要填充的值，强制转成list

参数训练可以单独训练或者跟随整个网络一起训练（看实验需要）

李宏毅homework——hw4中介绍传参： fix_embedding = True embedding_dim=250, hidden_dim=250, num_layers=1, dropout=0.5,

embedding=self.embedding_matrix，追溯tracing

self.embedding_matrix.append(self.embedding[word]) self.embedding=Word2Vec.load(self.w2v_path) from gensim.models import Word2Vec

class LSTM_Net(nn.Module): def __init__(self, embedding, embedding_dim, hidden_dim, num_layers, dropout=0.5, fix_embedding=True): super(LSTM_Net, self).__init__() # 製作 embedding layer self.embedding = torch.nn.Embedding(embedding.size(0),embedding.size(1)) self.embedding.weight = torch.nn.Parameter(embedding) # 是否將 embedding fix住，如果fix_embedding為False，在訓練過程中，embedding也會跟著被訓練 self.embedding.weight.requires_grad = False if fix_embedding else True self.embedding_dim = embedding.size(1) self.hidden_dim = hidden_dim self.num_layers = num_layers self.dropout = dropout self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=num_layers, batch_first=True) self.classifier = nn.Sequential( nn.Dropout(dropout), nn.Linear(hidden_dim, 1), nn.Sigmoid() ) def forward(self, inputs): inputs = self.embedding(inputs) x, _ = self.lstm(inputs, None) # x 的 dimension (batch, seq_len, hidden_size) # 取用 LSTM 最後一層的 hidden state x = x[:, -1, :] x = self.classifier(x) return x

https://www.cnblogs.com/dogecheng/p/11565530.html 推荐！使用keras的embedding层实战型的大佬呀！！！！包括IBMD情感分析去标点符号

总结

我一直纠结的问题： 1）如何将gensim库预训练的model应用到nn.Embedding，为什么self.embedding.weight = torch.nn.Parameter(embedding) 答：我们不是得到（max_word，feature），其实这是gensim.model.Word2Vec模型的某一层f1（我感觉是第一层），该网络模型后面还有隐藏层。如果基于CBOW算法，输入的是4个词，输出的是中间词的概率。而训练过程中，我们就认为f1层已经学习到语义信息。f1层既有表达每个词的向量的作用，也有作为网络参数的作用。 2）nn.Embedding层到底有什么作用输入索引，返回索引对应的词向量

附上官方图吧，现在清晰多了

最新回复(0)