[paper] multi-human parsing (MHP) (Zhao et al., 2018) dataset.

news/2023/12/9 15:36:38

Towards Real World Human Parsing: Multiple-Human Parsing in the Wild
Paper: https://arxiv.org/pdf/1705.07206.pdf

提出多人语义分割数据集:4980张图片(训练/验证/测试:3000/1000/980),每张包含2-16人,18个语义标签。

多人分割模型MH-Parser包含5个组件:
Representation learner (FCN提特征)
Global parser (用特征生成分割图)
Candidate nominator (RPN生成bbox)
Local parser (使用特征和bbox生成局部分割)
Global-local aggregator (结合全局和局部信息得到最终每个人的分割)

we introduce the Multiple-Human Parsing (MHP) dataset, which contains multiple persons in a real world scene per single image.

The MHP dataset contains various numbers of persons (from 2 to 16) per image with 18 semantic classes for each parsing annotation. Persons appearing in the MHP images present sufficient variations in pose, occlusion and interaction.

To tackle the multiple-human parsing problem, we also propose a novel Multiple-Human Parser (MH-Parser), which considers both the global context and local cues for each person in the parsing process.

Introduction

all the human parsing datasets only contain one person per image, while usually multiple persons appear simultaneously in a realistic scene.

Previous work on human parsing mainly focuses on the problem of parsing in controlled and simplified conditions.

simultaneous presence of multiple persons.

we tackle the problem of person detection and human parsing simultaneously so that both the global information and the local information are employed.

contributions:

  • We introduce the multiple-human parsing problem that extends the research scope of human parsing and matches real world scenarios better in various applications.

  • We construct a new large-scale benchmark, named Multiple-Human Parsing (MHP) dataset, to advance the development of relevant techniques.

  • We propose a novel MH-Parser model for multiple-human parsing, which integrates global context as well as local cues for human parsing and significantly outperforms the naive “detect-and-parse” approach.

Human parsing

Instance-aware object segmentation

The MHP dataset

this is the first large scale dataset focusing on multiple-human parsing.

4980 images, each image contains 2 to 16 humans, totally there are 14969 person level annotations.

Image collection and annotation methodology

we manually specify several underlying relationships (e.g., family, couple, team, etc.), and several possible scenes (e.g., sports, conferences, banquets, etc.)

The first task is manually counting the number of foreground persons and duplicating each image into several copies according to that number.

the second is to assign the fine-grained pixel-wise label for each instance.

Dataset statistics

training/validation/test: 3000/1000/980 (randomly choose)

The images in the MHP dataset contain diverse human numbers, appearances, viewpoints and relationships (see Figure 1).

Multiple-Human Parsing Methods

MH-Parser

The proposed MH-Parser has five components:

  • Representation learner

    We use a trunk network to learn rich and discriminative representations. we preserve the spatial information of the image by employing fully convolutional neural networks.

    images and annotations => representations

  • Global parser

    capture the global information of the whole image. The global parser takes the representation from the representation learner and generates a semantic parsing map of the whole image.

    representations => a semantic parsing map of the whole image

  • Candidate nominator

    We use a candidate nominator to generate local regions of interest. The candidate nominator consists of a Region Proposal Network (RPN).

    representations => candidate box

  • Local parser

    give a fine-grained prediction of the semantic parsing labels for each person in the image.

    representations, candidate box => semantic parsing labels for each person

  • Global-local aggregator

    leverages both the global and local information when performing the parsing task of each person.

    the hidden representations from both the local parser and the global parser => a set of semantic parsing predictions for each candidate box

Detect-and-parse baseline

In the detection stage, we use the representation learner and the candidate nominator as the detection model.

In the parsing stage, we use the representation learner and the local prediction as the parsing model.

Experiments

Performance evaluation

The goal of multiple-human parsing is to accurately detect the persons in one image and generate semantic category predictions for each pixel in the detected regions.

Mean average precision based on pixel (mAPpmAPp)

we adopt pixel-level IOU of different semantic categories on a person.

Percentage of correctly segmented body parts (PCP)

evaluate how well different semantic categories on a human are segmented.

Global Mean IOU

evaluates how well the overall parsing predictions match the overall global parsing labels.

Implementation details

  • representation learner

    adopt a residual network [19] with 50 layers, contains all the layers in a standard residual network except the fully connected layers.

    input: an image with the shorter side resized to 600 pixels and the longer side no larger than 1000 pixels

    output: 1/16 of the spatial dimension of the input image

  • global parser

    add a deconvolution layer after the representation learner.

    output: a feature map with spatial dimension 1/8 of the input image

  • candidate nominator

    use region proposal network (RPN) to generate region proposals.

    output: region proposals

  • local parser

    based on the region after Region of Interest (ROI) pooling from the representation learner and the size after pooling is 40.

  • global-local aggregator

    the local part is from the hidden layer in the local parser, and the global part uses the feature after ROI pooling from the hidden layer of the global parser with the same pooled size.

The network is optimized with one image per batch and the optimizer used is Adam [20].

Experimental analysis

Overall performance evaluation

RL stands for the representation learner, G means the global parser, L denotes the local parser, A for aggregator.

Qualitative comparison

We can see that the MH-Parser captures more fine-grained details compared to the global parser, as some categories with a small number of pixels are accurately predicted.

Conclusion and future work

In this paper, we introduced the multiple-human parsing problem and a new large-scale MHP dataset for developing and evaluating multiple-human parsing models.

We also proposed a novel MH-Parser algorithm to address this new challenging problem and performed detailed evaluations of the proposed method with different baselines on the new benchmark dataset.

--------------------- 作者:lijiancheng0614 来源:CSDN 原文:https://blog.csdn.net/lijiancheng0614/article/details/73195221?utm_source=copy 版权声明:本文为博主原创文章,转载请附上博文链接!


http://www.ppmy.cn/news/837492.html

相关文章

mhp2nbsp;BOSS属性列表+部分BOSS打法

リオレイア (雌火龙) 弱点部位:(切断,打击,弹)头,腹 弱点属性:龙 ,雷 耐:火 イヤンクツク(大怪鸟) 弱点部位&#xff1a…

【深度学习】:用于 GAN 的生成器架构 - 生成人脸

一、说明 生成对抗网络(GAN)是机器学习中一个相对较新的概念,于2014年首次推出。他们的目标是合成与真实图像无法区分的人工样本,例如图像。GAN 应用程序的一个常见示例是通过从名人人脸数据集中学习来生成人工人脸图像。虽然GAN图像随着时间的推移变得更加逼真,但它们的主…

golang for range循环坑

比较两段代码: package mainimport "fmt"func main() {a : []int{1, 2, 3, 4, 5, 6, 7, 8, 9}for len(a) > 0 {a a[1:]fmt.Println(a)} }输出 [2 3 4 5 6 7 8 9] [3 4 5 6 7 8 9] [4 5 6 7 8 9] [5 6 7 8 9] [6 7 8 9] [7 8 9] [8 9] [9] []这是符合…

现在的网吧生活

在网吧里,任何时间都会有人在吃饭,有人在睡觉,有人在玩。不论是清晨,中午,还是深夜,都有一些人挺不住了,他们用疲惫无神的双眼仔细注视着屏幕,仿佛 创作中的马克思。他们是铁人&…

没有磁盘计算机无法运行,网吧的电脑大多数都没有硬盘,那电脑怎么运行

很多网吧的电脑上确实没有安装硬盘,大都是采用了无盘技术。无盘技术简单讲就是购买专用的服务器,然后把客户机(我们在网吧用的电脑)上的系统镜像到服务器上,客户机启动后通过主板上的网卡引导服务器的系统镜像至本地,读取数据来引…

有盘网吧和无盘服务器有什么区别,有盘网吧和无盘网吧各有千秋

前面你可能已经了解过了有盘网吧和无盘网吧的组建方案,那么作为一个好的网吧技术人员该如何为网吧选择一种好的类型呢?我想这个问题当你看过下面有盘网吧和无盘网吧的优缺点介绍自然心中就有了答案。 有盘网吧的优点是: 第一:在脱…

华为网吧服务器型号,网吧服务器推荐

网吧服务器推荐 内容精选 换一换 如果密码丢失、或创建时未设置密码,推荐您在控制台设置登录密码。 只有运行中的云服务器云主机才允许用户登录。Windows操作系统用户名“Administrator”。首次登录云耀云服务器,请先通过“重置密码”功能设置登录密码。…

网吧无盘主副服务器,网吧无盘服务器教程

目录 第一篇;什么是无盘? 解释无盘 第二篇;无盘的构造包括那些? 1、读盘;什么是读盘? 2、写盘;什么是写盘? 3、系统盘;什么是系统盘? 4、PXE;什么…

华为网吧服务器型号,网吧需要什么配置的服务器

网吧需要什么配置的服务器 内容精选 换一换 Cloud-init是开源的云初始化程序,能够对新创建裸金属服务器中指定的自定义信息(主机名、密钥和用户数据等)进行初始化配置。当前,所有公共镜像均支持Cloud-init特性。通过Cloud-init进行裸金属服务器的初始化配…

数据智能交融,AI引领未来 | 数说故事成为华为云盘古大模型3.0首批联创单位之一

7月7日-9日,华为开发者大会2023(Cloud)在东莞举行,并在7日下午正式对外发布“华为云盘古大模型3.0”。盘古大模型3.0围绕行业重塑、技术扎根、开放同飞三大方向,持续打造核心竞争力,为行业客户、伙伴及开发…

Spark和Hive概念

Spark介绍: Spark是一个开源的分布式数据处理引擎,最初由加州大学伯克利分校的AMPLab开发。它被设计用来处理大规模数据集,提供快速、通用、易用的数据处理框架。Spark能够在内存中快速处理数据,支持多种数据源,包括Ha…

硬盘数据恢复工具

硬盘数据恢复工具哪个好?下载硬盘数据恢复软件免费试用版并按照下面的步骤操作,就能轻松地恢复硬盘丢失的重要数据。 硬盘数据恢复工具简介 硬盘数据恢复工具软件有自己独特的扫描模式,为NTFS、FAT12/16/32、exFAT、Ext2/Ext3/Ext4等磁盘的文…

DiskGenius 数据恢复工具

DiskGenius 下载,试用版 http://download.eassos.cn/DGLite520884_x64.zip 搜索已丢失分区(重建分区表) 分区4K扇区对齐检测搜索已丢失分区(重建分区表)重建主引导记录(重建MBR) “重建分区表”功能是在原DOS版的基础上重写并增强的功能。它能通过已丢失或已删…

超实用硬盘数据恢复工具介绍!永久免费

硬盘作为数据存放工具,其安全性是非常重要的,如何更好的使用管理自己的硬盘,出现问题如何修复也是很多用户关注的问题,今天笔者就给大家介绍两款非常实用的软件,帮助你更好的使用、检测你的硬盘。 当前可以找到的硬盘管…

硬盘数据恢复工具(推荐)

今天找了十个用于硬盘数据恢复的工具。效率和效果在众多工具中都挺不错的。毅帆逗带你一起了解一下吧!第10位. Disk2vhd使用Disk2vhd创建虚拟硬盘很方便。从live机器上创建虚拟磁盘,与Microsoft Virtual PC和Microsoft Hyper-V一起使用时很好的选择。你可…

uniapp uni实人认证

uni实人认证依赖 目前仅支持App平台。 h5端活体人脸检测,使用的是百度云的h5人脸实名认证 使用要求 1、app端 在使用前,请确保您已注册DCloud账号,并已完成实名认证。 然后需要按文档开通服务 业务开通 | uni-app官网 2、h5端 在使用前…

Centos7安装SDWebui

Centos7安装SDWebui 1.nvidia显卡驱动安装 #查看显卡编号 lspci | grep -i vga#查询显卡型号 http://pci-ids.ucw.cz/mods/PC/10de?actionhelp?helppci#安装依赖包 yum install kernel-devel gcc -y #查看nouveau是否已禁用,如果有内容说明没有禁用 lsmod | gre…

官网下载idea历史版本

1.进入官网 官网下载地址:https://www.jetbrains.com.cn/idea/ 进入后是这样的,如下图: 2.点击【下载】按钮,进入下载页面 (上图中的两个地方的【下载】按钮是一样的)。 3.选择右下角的【其他版本】就可…

显卡超频 linux arch 3090

需要的命令: https://www.flathub.org/apps/details/com.leinardi.gwe 安装前 加入setup guider 把其他需求安装了, 提示 先安装 flatpak 并重启电脑。

「B站焊武帝」再出圈!孤身爆肝造CPU,软硬件全自研,可玩游戏,基础器件成本不到1000元

两年时间,一个90后体制内小哥下班之后只干三件私务,那就是: 手搓CPU!手搓CPU!还是***手搓CPU! 纯手工制作、全自主研发,于是一个名叫“初芯”的CPU终极形态终于诞生。 据UP主估计,…
最新文章