voc数据集_目标检测数据集的增强（旋转，镜像，亮度等）含针对VOC标注格式数据的源码...

对目标检测数据集的增强方案有很多，网上的资料写的零零散散，而且我好像没有找到一个系统完整的针对某套完整数据集的源码，这两天自己写了几个增强方案的源码，针对的是voc格式的数据集！

只要有自己标注好的数据，按VOC格式组织好文件，就能用上这套代码。比如目前只有几百张数据，那么这样增强后会大大的扩充自己的数据集。

源码地址github.com

0对图像/标签的旋转

难点在于对标签的旋转，前面其实我有写过一篇文章介绍过旋转大致原理

lim0：目标检测增强数据——旋转目标和对应标签zhuanlan.zhihu.com

前面的文章的代码只是针对一张图片的一个demo。本文会给出一个具体的处理VOC数据集格式的完整代码，涉及对.xml文件的基本操作，以及对文件，文件夹的基本操作，虽然都很基础，但是很多细节还是值得记录:

需要处理的数据文件树(VOC数据格式)

|---Annotations

----0.xml

......

----XX.xml

|---JPEGImages

----0.jpg

......

----XX..jpg

#定义旋转图像文件的函数
def getRotatedImg(Pi_angle,img_path,img_write_path):
    img = cv2.imread(img_path)
    .....
        ###详细代码见github
    rotated_img = cv2.warpAffine(img, M, (cols, rows))  # 旋转后的图像保持大小不变
    cv2.imwrite(img_write_path,rotated_img)
    return a,b
#定义处理.xml文件的函数
def getRotatedAnno(Pi_angle,a,b,anno_path,anno_write_path):
    tree = ET.parse(anno_path)
    root = tree.getroot()
    objects = root.findall("object")
    for obj in objects:
        .....
        ###详细代码见github
    tree.write(anno_write_path)  # 保存修改后的XML文件

def rotate(angle,img_dir,anno_dir,img_write_dir,anno_write_dir):
    if not os.path.exists(img_write_dir):
        os.makedirs(img_write_dir)
    if not os.path.exists(anno_write_dir):
        os.makedirs(anno_write_dir)
    Pi_angle = -angle * math.pi / 180.0  # 弧度制，后面旋转坐标需要用到，注意负号！！！
    img_names=os.listdir(img_dir)
    for img_name in img_names:
        .....
        ###详细代码见github
        #
        a,b=getRotatedImg(Pi_angle,img_path,img_write_path)
        getRotatedAnno(Pi_angle,a,b,anno_path,anno_write_path)

angles=[-30,30,.....]
img_dir='several/JPEGImages'
anno_dir='several/Annotations'
img_write_dir='Rotated/rotated_JPEGImages'
anno_write_dir='Rotated/rotated_Annotations'
for angle in angles:
    rotate(angle,img_dir,anno_dir,img_write_dir,anno_write_dir)

1不损失图像信息的旋转

由于图像的旋转会使得一部分信息损失(可以脑补)，而且不是很容易的判断旋转后的图像是否还包含我们的完整目标。因此可以尝试把原图像先安装长/短边的长度填充为一个正方形，这样可以确保填充后的图像在旋转过程中，原图中的目标信息不会丢失。然后可以旋转作一个裁剪（根据目标的标签坐标，只要避开旋转后的标签坐标就能很好的裁剪出图像），当然也可以不裁剪，但是这样会降低模型训练的效率

img=cv2.imread('14h.jpg')
rows, cols = img.shape[:2]
##填充图像为正方形，而且要能保证填充后的图像在0到360°旋转的时候，原图像的像素不会损失
re=cv2.copyMakeBorder(img,int(cols/2),int(cols/2),int(rows/2),int(rows/2),cv2.BORDER_CONSTANT)

def getRotatedImg(Pi_angle,img_path,img_write_path):
    img = cv2.imread(img_path)
    rows, cols = img.shape[:2]
    a, b = cols / 2, rows / 2
    M = cv2.getRotationMatrix2D((a, b), angle, 1)
    rotated_img = cv2.warpAffine(img, M, (cols, rows))  # 旋转后的图像保持大小不变
    cv2.imwrite(img_write_path,rotated_img)

for angle in range(0,180,30):
    Pi_angle = -angle * math.pi / 180.0
    img_path='re.jpg'
    img_write_path=str(angle)+'.jpg'
    getRotatedImg(Pi_angle, img_path, img_write_path)

#验证是否标签被正确的改变
#for origin image: xmin:606 ymin:489 xmax:855 ymax:1023
cv2.rectangle(re,(606+int(rows/2),489+int(cols/2)),(855+int(rows/2),1023+int(cols/2)),(0,255,0),4)
#
def crop():
    pass

2对图像/标签的镜像

该部分主要是对图像进行水平，竖直，对角的镜像变换。对图像的操作cv2.flip()；对标签的变化其实就是利用原图的长宽来减去原像素，三种情况有所不同，但很好理解，只是有些细节需要注意，见代码！注：对原图的对角镜像其实等价于对原图旋转180°。原代码全部贴出来太多了，这里只展示结构，详细代码请见作者github，文章开头已经给出

W,H代表图像的尺度

x_out=W-x_in 水平只是x变，y不变

y_out=W-y_in 竖直只是y变，x不变

对角X变，y变

注意细节就是变完后的：xmin,xmax,ymin,ymax是哪一个，自己要判断好，见代码！！！

def h_MirrorImg(img_path,img_write_path):
    img = cv2.imread(img_path)
    mirror_img = cv2.flip(img, 1)                #水平镜像
    cv2.imwrite(img_write_path,mirror_img)
def v_MirrorImg(img_path,img_write_path):
    pass                                       #竖直镜像
def a_MirrorImg(img_path,img_write_path):
    pass                                    #对角镜像

def h_MirrorAnno(anno_path,anno_write_path):
    tree = ET.parse(anno_path)
    root = tree.getroot()
    size=root.find('size')
    w=int(size.find('width').text)
    objects = root.findall("object")
    for obj in objects:
        bbox = obj.find('bndbox')
        x1 = float(bbox.find('xmin').text)
        x2 = float(bbox.find('xmax').text)
        x1=w-x1+1 #这就是镜像后的坐标和原坐标的对应关系，但为了避免为0或小于0最好加1
        x2=w-x2+1
        assert x1>0
        assert x2>0
        bbox.find('xmin').text=str(int(x2))      #细节:是x2而不是x1
        bbox.find('xmax').text=str(int(x1))
    tree.write(anno_write_path)  # 保存修改后的XML文件

def v_MirrorAnno(anno_path,anno_write_path):          #竖直镜像标签的处理
    pass ###详细代码见github
    tree.write(anno_write_path)  

def a_MirrorAnno(anno_path,anno_write_path):         #对角镜像标签的处理

    pass ###详细代码见github
    tree.write(anno_write_path)  

def mirror(img_dir,anno_dir,img_write_dir,anno_write_dir):
    for img_name in img_names:
         ###详细代码见github

3对原图和标签的亮度/对比度变换

该部分本还应该有对图像的颜色变换，下次更新！

对亮度和对比度的调节利用的是对像素的简单线性变换，就是一个一次函数

output_pixel=alpha*(input_pixel)+beta

def getColorImg(alpha,beta,img_path,img_write_path):
    img = cv2.imread(img_path)
    colored_img = np.uint8(np.clip((alpha * img + beta), 0, 255))
    cv2.imwrite(img_write_path,colored_img)

def getColorAnno(anno_path,anno_write_path):
    tree = ET.parse(anno_path)
    tree.write(anno_write_path)  # 标签不用变，因为变换后的图像和原图是一样大小

def color(alpha,beta,img_dir,anno_dir,img_write_dir,anno_write_dir):
    ####详细代码见github
    for img_name in img_names:
       ###详细代码见github
        getColorImg(alpha,beta,img_path,img_write_path)
        getColorAnno(anno_path,anno_write_path)

alphas=[0.3,0.5,1.2,1.6]
beta=10

后面有时间的话还会继续补充给出一些其它方案的代码

目录

0对图像/标签的旋转

1不损失图像信息的旋转

2对图像/标签的镜像

3对原图和标签的亮度/对比度变换