[paper] multi-human parsing (MHP) (Zhao et al., 2018) dataset.

news/2024/10/9 10:47:48/

Towards Real World Human Parsing: Multiple-Human Parsing in the Wild
Paper: https://arxiv.org/pdf/1705.07206.pdf

提出多人语义分割数据集:4980张图片(训练/验证/测试:3000/1000/980),每张包含2-16人,18个语义标签。

多人分割模型MH-Parser包含5个组件:
Representation learner (FCN提特征)
Global parser (用特征生成分割图)
Candidate nominator (RPN生成bbox)
Local parser (使用特征和bbox生成局部分割)
Global-local aggregator (结合全局和局部信息得到最终每个人的分割)

we introduce the Multiple-Human Parsing (MHP) dataset, which contains multiple persons in a real world scene per single image.

The MHP dataset contains various numbers of persons (from 2 to 16) per image with 18 semantic classes for each parsing annotation. Persons appearing in the MHP images present sufficient variations in pose, occlusion and interaction.

To tackle the multiple-human parsing problem, we also propose a novel Multiple-Human Parser (MH-Parser), which considers both the global context and local cues for each person in the parsing process.

Introduction

all the human parsing datasets only contain one person per image, while usually multiple persons appear simultaneously in a realistic scene.

Previous work on human parsing mainly focuses on the problem of parsing in controlled and simplified conditions.

simultaneous presence of multiple persons.

we tackle the problem of person detection and human parsing simultaneously so that both the global information and the local information are employed.

contributions:

  • We introduce the multiple-human parsing problem that extends the research scope of human parsing and matches real world scenarios better in various applications.

  • We construct a new large-scale benchmark, named Multiple-Human Parsing (MHP) dataset, to advance the development of relevant techniques.

  • We propose a novel MH-Parser model for multiple-human parsing, which integrates global context as well as local cues for human parsing and significantly outperforms the naive “detect-and-parse” approach.

Human parsing

Instance-aware object segmentation

The MHP dataset

this is the first large scale dataset focusing on multiple-human parsing.

4980 images, each image contains 2 to 16 humans, totally there are 14969 person level annotations.

Image collection and annotation methodology

we manually specify several underlying relationships (e.g., family, couple, team, etc.), and several possible scenes (e.g., sports, conferences, banquets, etc.)

The first task is manually counting the number of foreground persons and duplicating each image into several copies according to that number.

the second is to assign the fine-grained pixel-wise label for each instance.

Dataset statistics

training/validation/test: 3000/1000/980 (randomly choose)

The images in the MHP dataset contain diverse human numbers, appearances, viewpoints and relationships (see Figure 1).

Multiple-Human Parsing Methods

MH-Parser

The proposed MH-Parser has five components:

  • Representation learner

    We use a trunk network to learn rich and discriminative representations. we preserve the spatial information of the image by employing fully convolutional neural networks.

    images and annotations => representations

  • Global parser

    capture the global information of the whole image. The global parser takes the representation from the representation learner and generates a semantic parsing map of the whole image.

    representations => a semantic parsing map of the whole image

  • Candidate nominator

    We use a candidate nominator to generate local regions of interest. The candidate nominator consists of a Region Proposal Network (RPN).

    representations => candidate box

  • Local parser

    give a fine-grained prediction of the semantic parsing labels for each person in the image.

    representations, candidate box => semantic parsing labels for each person

  • Global-local aggregator

    leverages both the global and local information when performing the parsing task of each person.

    the hidden representations from both the local parser and the global parser => a set of semantic parsing predictions for each candidate box

Detect-and-parse baseline

In the detection stage, we use the representation learner and the candidate nominator as the detection model.

In the parsing stage, we use the representation learner and the local prediction as the parsing model.

Experiments

Performance evaluation

The goal of multiple-human parsing is to accurately detect the persons in one image and generate semantic category predictions for each pixel in the detected regions.

Mean average precision based on pixel (mAPpmAPp)

we adopt pixel-level IOU of different semantic categories on a person.

Percentage of correctly segmented body parts (PCP)

evaluate how well different semantic categories on a human are segmented.

Global Mean IOU

evaluates how well the overall parsing predictions match the overall global parsing labels.

Implementation details

  • representation learner

    adopt a residual network [19] with 50 layers, contains all the layers in a standard residual network except the fully connected layers.

    input: an image with the shorter side resized to 600 pixels and the longer side no larger than 1000 pixels

    output: 1/16 of the spatial dimension of the input image

  • global parser

    add a deconvolution layer after the representation learner.

    output: a feature map with spatial dimension 1/8 of the input image

  • candidate nominator

    use region proposal network (RPN) to generate region proposals.

    output: region proposals

  • local parser

    based on the region after Region of Interest (ROI) pooling from the representation learner and the size after pooling is 40.

  • global-local aggregator

    the local part is from the hidden layer in the local parser, and the global part uses the feature after ROI pooling from the hidden layer of the global parser with the same pooled size.

The network is optimized with one image per batch and the optimizer used is Adam [20].

Experimental analysis

Overall performance evaluation

RL stands for the representation learner, G means the global parser, L denotes the local parser, A for aggregator.

Qualitative comparison

We can see that the MH-Parser captures more fine-grained details compared to the global parser, as some categories with a small number of pixels are accurately predicted.

Conclusion and future work

In this paper, we introduced the multiple-human parsing problem and a new large-scale MHP dataset for developing and evaluating multiple-human parsing models.

We also proposed a novel MH-Parser algorithm to address this new challenging problem and performed detailed evaluations of the proposed method with different baselines on the new benchmark dataset.

--------------------- 作者:lijiancheng0614 来源:CSDN 原文:https://blog.csdn.net/lijiancheng0614/article/details/73195221?utm_source=copy 版权声明:本文为博主原创文章,转载请附上博文链接!


http://www.ppmy.cn/news/837492.html

相关文章

mhp2nbsp;BOSS属性列表+部分BOSS打法

リオレイア (雌火龙) 弱点部位:(切断,打击,弹)头,腹 弱点属性:龙 ,雷 耐:火 イヤンクツク(大怪鸟) 弱点部位&#xff1a…

【深度学习】:用于 GAN 的生成器架构 - 生成人脸

一、说明 生成对抗网络(GAN)是机器学习中一个相对较新的概念,于2014年首次推出。他们的目标是合成与真实图像无法区分的人工样本,例如图像。GAN 应用程序的一个常见示例是通过从名人人脸数据集中学习来生成人工人脸图像。虽然GAN图像随着时间的推移变得更加逼真,但它们的主…

golang for range循环坑

比较两段代码: package mainimport "fmt"func main() {a : []int{1, 2, 3, 4, 5, 6, 7, 8, 9}for len(a) > 0 {a a[1:]fmt.Println(a)} }输出 [2 3 4 5 6 7 8 9] [3 4 5 6 7 8 9] [4 5 6 7 8 9] [5 6 7 8 9] [6 7 8 9] [7 8 9] [8 9] [9] []这是符合…

现在的网吧生活

在网吧里,任何时间都会有人在吃饭,有人在睡觉,有人在玩。不论是清晨,中午,还是深夜,都有一些人挺不住了,他们用疲惫无神的双眼仔细注视着屏幕,仿佛 创作中的马克思。他们是铁人&…

没有磁盘计算机无法运行,网吧的电脑大多数都没有硬盘,那电脑怎么运行

很多网吧的电脑上确实没有安装硬盘,大都是采用了无盘技术。无盘技术简单讲就是购买专用的服务器,然后把客户机(我们在网吧用的电脑)上的系统镜像到服务器上,客户机启动后通过主板上的网卡引导服务器的系统镜像至本地,读取数据来引…

有盘网吧和无盘服务器有什么区别,有盘网吧和无盘网吧各有千秋

前面你可能已经了解过了有盘网吧和无盘网吧的组建方案,那么作为一个好的网吧技术人员该如何为网吧选择一种好的类型呢?我想这个问题当你看过下面有盘网吧和无盘网吧的优缺点介绍自然心中就有了答案。 有盘网吧的优点是: 第一:在脱…

华为网吧服务器型号,网吧服务器推荐

网吧服务器推荐 内容精选 换一换 如果密码丢失、或创建时未设置密码,推荐您在控制台设置登录密码。 只有运行中的云服务器云主机才允许用户登录。Windows操作系统用户名“Administrator”。首次登录云耀云服务器,请先通过“重置密码”功能设置登录密码。…

网吧无盘主副服务器,网吧无盘服务器教程

目录 第一篇;什么是无盘? 解释无盘 第二篇;无盘的构造包括那些? 1、读盘;什么是读盘? 2、写盘;什么是写盘? 3、系统盘;什么是系统盘? 4、PXE;什么…