BBYR Achieve
返回信息流
这是一条镜像帖。来源:北邮人论坛 / ml-dm / #34877同步于 2019/7/28
该镜像源已超过 30 天没有更新,可能在源站已被删除。
ML_DM机器人发帖

pytorch的bug

Caohf
2019/7/28镜像同步3 回复
刚开始学,跑一个模型,结果报如下错误: ``` RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ``` 定位在train函数里,下面是train函数 ``` def train(trainloader, model, criterion,optimizer,epoch): batch_time=AverageMeter() data_time = AverageMeter() losses = AverageMeter() top1 = AverageMeter() model.train() end=time.time() with torch.no_grad(): for i, (input,target) in enumerate(trainloader): #measure data loading time data_time.update(time.time()-end) input,target=input.cuda(),target.cuda() #compute output output =model(input) loss=criterion(output,target) #measure accuracy and record loss prec = accuracy(output, target)[0] losses.update(loss.item(), input.size(0)) top1.update(prec.item(), input.size(0)) #compute gradient and do SGD step optimizer.zero_grad() loss.backward() optimizer.step() #measure elapsed time batch_time.update(time.time() - end) end = time.time() if i%20==0: print('Epoch: [{0}][{1}/{2}]\t' 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 'Prec {top1.val:.3f}% ({top1.avg:.3f}%)'.format( epoch, i, len(trainloader), batch_time=batch_time, data_time=data_time, loss=losses, top1=top1)) del output ``` 主程序: ``` def main(): use_gpu=torch.cuda.is_available() gpu_ids = [0] device_ids=range(torch.cuda.device_count()) # print(len(device_ids)) if use_gpu: model=densenet_BC_cifar(depth=100,k=40,num_classes=100) if not os.path.exists('result'): os.makedirs('result') fdir='result/denseNet/cifar100' if not os.path.exists(fdir): os.makedirs(fdir) device = torch.device("cuda:" + str(device_ids[0])) model=model.to(device) #这里将模型复制到gpu ,默认是cuda('0') criterion=nn.CrossEntropyLoss().cuda() optimizer = optim.SGD(model.parameters(), 0.1, momentum=0.9, weight_decay=1e-4) cudnn.benchmark=True else: print('Cuda is not available') # validate(testloader,model,criterion) for epoch in range(0,20): adjust_learning_rate(optimizer,epoch) #train the epoch train(trainloader,model,criterion,optimizer,epoch) #evaluate on test prec=validate(testloader,model,criterion) #remember the best and save checkpoint is_best=pre>best_prec best_prec=max(prec,best_prec) save_checkpoint({ 'epoch':epoch+1, 'state_dict':model.state_dict(), 'best_prec':best_prec, 'optimizer':optimizer.state_dict(), },is_best,fdir) ``` 恳请各位大佬指点啊[ema13]
订阅后,新回复会通过你的通知中心匿名送达。
3 条回复
Caohf机器人#1 · 2019/7/28
是torch 1.1版本
Doffe机器人#2 · 2019/7/28
试试把with torch.no_grad()去掉? 【 在 Caohf (守法公民) 的大作中提到: 】 : [md] : 刚开始学,跑一个模型,结果报如下错误: : ``` : ...................
Caohf机器人#3 · 2019/7/29
已解决谢谢啦 【 在 Doffe (Doffe) 的大作中提到: 】 : 试试把with torch.no_grad()去掉?