【Darknet】计算mAP函数validate

it2023-07-23 69

之前和百度有个合作，要测一下他们模型的mAP，所以研究了一下Darknet里mAP到底是怎么算的。

validate_detector_map函数的原型是

float validate_detector_map(char *datacfg, char *cfgfile, char *weightfile, float thresh_calc_avg_iou, const float iou_thresh, const int map_points, int letter_box, network *existing_net)

datacfg - data文件 cfgfile - cfg文件 weightfile - weight文件 thresh_calc_avg_iou - 计算precision和recall的阈值（注：mAP和此值无关） iou_thresh - iou阈值，即目标与gt的iou超过多少认为是检测正确 map_points - 用多少个recall点来计算mAP，点越多越精确，点少算出的mAP偏小。默认为0，即用全部的点

// MS COCO - uses 101-Recall-points on PR-chart. // PascalVOC2007 - uses 11-Recall-points on PR-chart. // PascalVOC2010-2012 - uses Area-Under-Curve on PR-chart. // ImageNet - uses Area-Under-Curve on PR-chart.

letter_box - 是否保持原始分辨率 existing_net - 是否已存在网络，训练时调map是存在的，直接调map是要从配置文件新建网络下面分析一下核心代码。

1.每4幅图像一组计算，对于每一幅图像，先inference得到检测结果，再过滤掉小于阈值（注：这里阈值传的是0.005，因为要得到所有的检测结果）的检测。hier_thresh是以前YOLOv2用的，现在没用了。

for (t = 0; t < nthreads && i + t - nthreads < m; ++t) { const int image_index = i + t - nthreads; char *path = paths[image_index]; char *id = basecfg(path); float *X = val_resized[t].data; network_predict(net, X); int nboxes = 0; float hier_thresh = 0; detection *dets; if (args.type == LETTERBOX_DATA) { dets = get_network_boxes(&net, val[t].w, val[t].h, thresh, hier_thresh, 0, 1, &nboxes, letter_box); } else { dets = get_network_boxes(&net, 1, 1, thresh, hier_thresh, 0, 0, &nboxes, letter_box); } if (nms) { if (l.nms_kind == DEFAULT_NMS) do_nms_sort(dets, nboxes, l.classes, nms); else diounms_sort(dets, nboxes, l.classes, nms, l.nms_kind, l.beta_nms); }

2.得到网络的检测结果后，都存入detections这个box_prob类的数组里，它对应的属性有bbox，prob，index，类别，是否与gt匹配，对应gt的index。这个detections是后面求mAP用的。然后对每一个prob大于0的检测（实际上是大于0.005，因为小于此值的检测在NMS时被清零0），寻找与它IOU超过阈值且最大，类别相同的gt。如果能找到这样的gt，则更新truth_flag和unique_truth_index。

for (i = 0; i < nboxes; ++i) { int class_id; for (class_id = 0; class_id < classes; ++class_id) { float prob = dets[i].prob[class_id]; if (prob > 0) { detections_count++; detections = (box_prob*)xrealloc(detections, detections_count * sizeof(box_prob)); detections[detections_count - 1].b = dets[i].bbox; detections[detections_count - 1].p = prob; detections[detections_count - 1].image_index = image_index; detections[detections_count - 1].class_id = class_id; detections[detections_count - 1].truth_flag = 0; detections[detections_count - 1].unique_truth_index = -1; int truth_index = -1; float max_iou = 0; for (j = 0; j < num_labels; ++j) { box t = { truth[j].x, truth[j].y, truth[j].w, truth[j].h }; float current_iou = box_iou(dets[i].bbox, t); if (current_iou > iou_thresh && class_id == truth[j].id) { if (current_iou > max_iou) { max_iou = current_iou; truth_index = unique_truth_count + j; } } } // best IoU if (truth_index > -1) { detections[detections_count - 1].truth_flag = 1; detections[detections_count - 1].unique_truth_index = truth_index; }

3.存完detections后，然后计算TP、FP和平均IOU。这时的阈值就是thresh_calc_avg_iou了，从外部传入的，用于计算这个特定阈值下的TP、FP和平均IOU。但mAP是衡量多个阈值下的precision和recall的整体情况，与具体阈值无关。这里的found指当前检测的gt是否被匹配过。假设当前bbox预测第truth_index个gt，但这个gt已经被前面的bbox预测过了（z的范围是checkpoint_detections_count到detections_count - 1，即当前图像上已经处理过的bbox），由于NMS后各个预测结果的prob是降序排的，所以前面的那个预测的才是TP，这个是FP。

// calc avg IoU, true-positives, false-positives for required Threshold if (prob > thresh_calc_avg_iou) { int z, found = 0; for (z = checkpoint_detections_count; z < detections_count - 1; ++z) { if (detections[z].unique_truth_index == truth_index) { found = 1; break; } } if (truth_index > -1 && found == 0) { avg_iou += max_iou; ++tp_for_thresh; avg_iou_per_class[class_id] += max_iou; tp_for_thresh_per_class[class_id]++; } else { fp_for_thresh++; fp_for_thresh_per_class[class_id]++; } }

4.统计完所有图像后，计算平均IOU和各类的平均IOU。TP的IOU已计入avg_iou，FP的IOU是0。

if ((tp_for_thresh + fp_for_thresh) > 0) avg_iou = avg_iou / (tp_for_thresh + fp_for_thresh); int class_id; for(class_id = 0; class_id < classes; class_id++) { if ((tp_for_thresh_per_class[class_id] + fp_for_thresh_per_class[class_id]) > 0) avg_iou_per_class[class_id] = avg_iou_per_class[class_id] / (tp_for_thresh_per_class[class_id] + fp_for_thresh_per_class[class_id]); }

5.下面开始计算每个类的AP和mAP。先将detections按降序排好，detections[0]对应所有类别中最大的prob。rank表示置信度的等级，rank = 0时对应的prob最大，而rank = detections_count - 1时prob最小。再来看一下pr的含义，pr是一个classes × detections_count的数组，pr[class_id][rank]表示第class_id类只考虑prob大于等于第rank级对应的prob的检测结果的pr，也就是prob >= detections[rank].p这样条件下的所有目标的pr情况。所以初始化pr[class_id][rank].tp = pr[class_id][rank - 1].tp，且pr[class_id][rank] >= pr[class_id][rank-1]。因为rank提高了，要求的prob降低了，出现的检测结果不会比之前少，TP和FP也不会降低。最后rank == detections_count - 1时，所有检测的prob都大于这个水平（高于0.005）。truth_flags和之前一样是gt是否匹配了某个检测结果的标志。在每一个检测结果对应的prob上，根据其是否检测到了gt增加TP或FP数，再计算其precision和recall。

qsort(detections, detections_count, sizeof(box_prob), detections_comparator); // for PR-curve pr_t** pr = (pr_t**)calloc(classes, sizeof(pr_t*));//pr[classes][detections_count] for (i = 0; i < classes; ++i) pr[i] = (pr_t*)calloc(detections_count, sizeof(pr_t)); for (rank = 0; rank < detections_count; ++rank) { if (rank > 0) { int class_id; for (class_id = 0; class_id < classes; ++class_id) { pr[class_id][rank].tp = pr[class_id][rank - 1].tp; pr[class_id][rank].fp = pr[class_id][rank - 1].fp; } } box_prob d = detections[rank]; // if (detected && isn't detected before) if (d.truth_flag == 1) { if (truth_flags[d.unique_truth_index] == 0) { truth_flags[d.unique_truth_index] = 1; pr[d.class_id][rank].tp++; // true-positive } else pr[d.class_id][rank].fp++; } else { pr[d.class_id][rank].fp++; // false-positive } for (i = 0; i < classes; ++i) { const int tp = pr[i][rank].tp; const int fp = pr[i][rank].fp; const int fn = truth_classes_count[i] - tp; // false-negative = objects - true-positive pr[i][rank].fn = fn; if ((tp + fp) > 0) pr[i][rank].precision = (double)tp / (double)(tp + fp); else pr[i][rank].precision = 0; if ((tp + fn) > 0) pr[i][rank].recall = (double)tp / (double)(tp + fn); else pr[i][rank].recall = 0; } }

6.有了各点的pr情况后，下面就可以计算mAP了。分为两种情况，map_points为0时考虑所有recall点的precision，再累积求和，相当于PR曲线下的面积（注：采用外插方法，每一个recall对应的precision取不小于该recall的所有点中precision的最大值）。由于prob是由高到低排序的，从rank由大到小来看recall是从高到低遍历，对应的precision从低到高。recall是单调下降的，但precision可能有波动，如果随着recall下降precision没上升，则不计算这个点，直到遇到更高的precision才累加。map_points不为0时就更直观了，直接搜索大于recall点的最大precision值。相同数据集 -points 0 要比 -points 101 的mAP高一点。

for (i = 0; i < classes; ++i) { double avg_precision = 0; if (map_points == 0) { double last_recall = pr[i][detections_count - 1].recall; double last_precision = pr[i][detections_count - 1].precision; for (rank = detections_count - 2; rank >= 0; --rank) { double delta_recall = last_recall - pr[i][rank].recall; last_recall = pr[i][rank].recall; if (pr[i][rank].precision > last_precision) last_precision = pr[i][rank].precision; avg_precision += delta_recall * last_precision; } //add remaining area of PR curve when recall isn't 0 at rank-1 double delta_recall = last_recall - 0; avg_precision += delta_recall * last_precision; } // MSCOCO - 101 Recall-points, PascalVOC - 11 Recall-points else { int point; for (point = 0; point < map_points; ++point) { double cur_recall = point * 1.0 / (map_points - 1); double cur_precision = 0; for (rank = 0; rank < detections_count; ++rank) { if (pr[i][rank].recall >= cur_recall) // > or >= if (pr[i][rank].precision > cur_precision) cur_precision = pr[i][rank].precision; } avg_precision += cur_precision; } avg_precision = avg_precision / map_points; } mean_average_precision += avg_precision; }

直接看代码可能有点抽象，可以结合这篇文章后面的图理解一下。pr数组里每一个元素对应pr图上的一个点。计算过程中始终维护着last_precision这个变量，表示当前见过的最大precision。计算mAP时从右往左遍历这张图，可以想象一个点从右往左划过整个绿线：（1）向左移动时，delta_recall为移动的水平距离，这时last_precision不变，增加的AP为这段水平距离和最大precision组成的矩形面积（外插）；（2）向上移动时，recall不变，delta_recall = 0，所以AP不增加，但last_precision持续增加，达到下一个最高点。以上就是个人对mAP函数的一些理解，欢迎交流讨论。

最新回复(0)