您现在的位置是:网站首页> 编程资料编程资料

python目标检测yolo3详解预测及代码复现_python_

2023-05-26 393人已围观

简介 python目标检测yolo3详解预测及代码复现_python_

学习前言

对yolo2解析完了之后当然要讲讲yolo3,yolo3与yolo2的差别主要在网络的特征提取部分,实际的解码部分其实差距不大

代码下载

本次教程主要基于github中的项目点击直接下载,该项目相比于yolo3-Keras的项目更容易看懂一些,不过它的许多代码与yolo3-Keras相同。

我保留了预测部分的代码,在实际可以通过执行detect.py运行示例。

链接: https://pan.baidu.com/s/1_xLeytnjBBSL2h2Kj4-YEw

取码:i3hi

实现思路

1、yolo3的预测思路(网络构建思路)

YOLOv3相比于之前的yolo1和yolo2,改进较大,主要改进方向有:

1、使用了残差网络Residual,残差卷积就是进行一次3X3、步长为2的卷积,然后保存该卷积layer,再进行一次1X1的卷积和一次3X3的卷积,并把这个结果加上layer作为最后的结果, 残差网络的特点是容易优化,并且能够通过增加相当的深度来提高准确率。其内部的残差块使用了跳跃连接,缓解了在深度神经网络中增加深度带来的梯度消失问题。

2、提取多特征层进行目标检测,一共提取三个特征层,特征层的shape分别为(13,13,75),(26,26,75),(52,52,75),最后一个维度为75是因为该图是基于voc数据集的,它的类为20种,yolo3只有针对每一个特征层存在3个先验框,所以最后维度为3x25;

如果使用的是coco训练集,类则为80种,最后的维度应该为255 = 3x85,三个特征层的shape为(13,13,255),(26,26,255),(52,52,255)

3、其采用反卷积UmSampling2d设计,逆卷积相对于卷积在神经网络结构的正向和反向传播中做相反的运算,其可以更多更好的提取出特征。

其实际情况就是,输入N张416x416的图片,在经过多层的运算后,会输出三个shape分别为(N,13,13,255),(N,26,26,255),(N,52,52,255)的数据,对应每个图分为13x13、26x26、52x52的网格上3个先验框的位置。

实现代码如下,其在实际调用时,会调用其中的yolo_inference函数,此时获得三个特征层的内容。

def _darknet53(self, inputs, conv_index, training = True, norm_decay = 0.99, norm_epsilon = 1e-3): """ Introduction ------------ 构建yolo3使用的darknet53网络结构 Parameters ---------- inputs: 模型输入变量 conv_index: 卷积层数序号,方便根据名字加载预训练权重 weights_dict: 预训练权重 training: 是否为训练 norm_decay: 在预测时计算moving average时的衰减率 norm_epsilon: 方差加上极小的数,防止除以0的情况 Returns ------- conv: 经过52层卷积计算之后的结果, 输入图片为416x416x3,则此时输出的结果shape为13x13x1024 route1: 返回第26层卷积计算结果52x52x256, 供后续使用 route2: 返回第43层卷积计算结果26x26x512, 供后续使用 conv_index: 卷积层计数,方便在加载预训练模型时使用 """ with tf.variable_scope('darknet53'): # 416,416,3 -> 416,416,32 conv = self._conv2d_layer(inputs, filters_num = 32, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index)) conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) conv_index += 1 # 416,416,32 -> 208,208,64 conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 64, blocks_num = 1, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) # 208,208,64 -> 104,104,128 conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 128, blocks_num = 2, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) # 104,104,128 -> 52,52,256 conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 256, blocks_num = 8, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) # route1 = 52,52,256 route1 = conv # 52,52,256 -> 26,26,512 conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 512, blocks_num = 8, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) # route2 = 26,26,512 route2 = conv # 26,26,512 -> 13,13,1024 conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 1024, blocks_num = 4, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) # route3 = 13,13,1024 return route1, route2, conv, conv_index # 输出两个网络结果 # 第一个是进行5次卷积后,用于下一次逆卷积的,卷积过程是1X1,3X3,1X1,3X3,1X1 # 第二个是进行5+2次卷积,作为一个特征层的,卷积过程是1X1,3X3,1X1,3X3,1X1,3X3,1X1 def _yolo_block(self, inputs, filters_num, out_filters, conv_index, training = True, norm_decay = 0.99, norm_epsilon = 1e-3): """ Introduction ------------ yolo3在Darknet53提取的特征层基础上,又加了针对3种不同比例的feature map的block,这样来提高对小物体的检测率 Parameters ---------- inputs: 输入特征 filters_num: 卷积核数量 out_filters: 最后输出层的卷积核数量 conv_index: 卷积层数序号,方便根据名字加载预训练权重 training: 是否为训练 norm_decay: 在预测时计算moving average时的衰减率 norm_epsilon: 方差加上极小的数,防止除以0的情况 Returns ------- route: 返回最后一层卷积的前一层结果 conv: 返回最后一层卷积的结果 conv_index: conv层计数 """ conv = self._conv2d_layer(inputs, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index)) conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) conv_index += 1 conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index)) conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) conv_index += 1 conv = self._conv2d_layer(conv, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index)) conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) conv_index += 1 conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index)) conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) conv_index += 1 conv = self._conv2d_layer(conv, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index)) conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) conv_index += 1 route = conv conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index)) conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon) conv_index += 1 conv = self._conv2d_layer(conv, filters_num = out_filters, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index), use_bias = True) conv_index += 1 return route, conv, conv_index # 返回三个特征层的内容 def yolo_inference(self, inputs, num_anchors, num_classes, training = True): """ Introduction ------------ 构建yolo模型结构 Parameters ---------- inputs: 模型的输入变量 num_anchors: 每个grid cell负责检测的anchor数量 num_classes: 类别数量 training: 是否为训练模式 """ conv_index = 1 # route1 = 52,52,256、route2 = 26,26,512、route3 = 13,13,1024 conv2d_26, conv2d_43, conv, conv_index = self._darknet53(inputs, conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon) with tf.variable_scope('yolo'): #--------------------------------------# # 获得第一个特征层 #--------------------------------------# # conv2d_57 = 13,13,512,conv2d_59 = 13,13,255(3x(80+5)) conv2d_57, conv2d_59, conv_index = self._yolo_block(conv, 512, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon) #--------------------------------------# # 获得第二个特征层 #--------------------------------------# conv2d_60 = self._conv2d_layer(conv2d_57, filters_num = 256, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index)) conv2d_60 = self._batch_normalization_layer(conv2d_60, name = "batch_normalization_" + str(conv_index),training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon) conv_index += 1 # unSample_0 = 26,26,256 unSample_0 = tf.image.resize_nearest_neighbor(conv2d_60, [2 * tf.shape(conv2d_60)[1], 2 * tf.shape(conv2d_60)[1]], name='upSample_0') # route0 = 26,26,768 route0 = tf.concat([unSample_0, conv2d_43], axis = -1, name = 'route_0') # conv2d_65 = 52,52,256,conv2d_67 = 26,26,255 conv2d_65, conv2d_67, conv_index = self._yolo_block(route0, 256, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon) #--------------------------------------# # 获得第三个特征层 #--------------------------------------# conv2d_68 = self._conv2d_layer(conv2d_65, filters_num = 128, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index)) conv2d_68 = self._batch_normalization_layer(conv2d_68, name = "batch_normalization_" + str(conv_index), training=training, norm_decay=self.norm_decay, norm_epsilon = self.norm_epsilon) conv_index += 1 # unSample_1 = 52,52,128 unSample_1 = tf.image.resize_nearest_neighbor(conv2d_68, [2 * tf.shape(conv2d_68)[1], 2 * tf.shape(conv2d_68)[1]], name='upSample_1') # route1= 52,52,384 route1 = tf.concat([unSample_1, conv2d_26], axis = -1, name = 'route_1') # conv2d_75 = 52,52,255 _, conv2d_75, _ = self._yolo_block(route1, 128, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon) return [conv2d_59, conv2d_67, conv2d_75] 

2、利用先验框对网络的输出进行解码

yolo3的先验框生成与yolo2的类似,如果不明白先验框是如何生成的,可以看我的上一篇博文

python目标检测yolo2详解及其预测代码复现

其实yolo3的解码与yolo2的解码过程一样,只是对于yolo3而言,其需要对三个特征层进行解码,三个特征层的shape分别为(N,13,13,255),(N,26,26,255),(N,52,52,255)的数据,对应每个图分为13x13、26x26、52x52的网格上3个先验框的位置。

此处需要用到一个循环。

1、将第一个特征层reshape成[-1, 13, 13, 3, 80 + 5],代表169个中心点每个中心点的3个先验框的情况。

2、将80+5的5中的xywh分离出来,0、1是xy相对中心点的偏移量;2、3是宽和高的情况;4是置信度。

3、建立13x13

-六神源码网