PGA: Personalizing Grasping Agents with Single Human-Robot Interaction

Junghyun Kim, Gi-Cheon Kang^*, Jaein Kim^*, Seoyun Yang, Minjoon Jung, Byoung-Tak Zhang

Submitted to The 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv | Poster (TBD) | Presentation Video (TBD) | 1 min Demo Video

Overview

Citation

If you use this code or data in your research, please consider citing:

@article{kim2023pga,
  title={PGA: Personalizing Grasping Agents with Single Human-Robot Interaction},
  author={Kim, Junghyun and Kang, Gi-Cheon and Kim, Jaein and Yang, Seoyun and Jung, Minjoon and Zhang, Byoung-Tak},
  journal={arXiv preprint arXiv:2310.12547},
  year={2023}
}

Environment Setup

Python 3.7+, PyTorch v1.9.1+, CUDA 11+ and CuDNN 7+, Anaconda/Miniconda (recommended)

Install Anaconda or Miniconda from here.
Clone this repository and create an environment:

git clone https://www.github.com/JHKim-snu/PGA
conda create -n pga python=3.8
conda activate pga

Install all dependencies:

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

GraspMine Dataset

GraspMine is an LCRG (Language-Guided Robotic Grasping) dataset collected to validate the grasping agent's personalization capability. GraspMine aims to locate and grasp personal objects given a personal indicator, e.g., "my sleeping pills." GraspMine is built upon 96 personal objects, 100+ everyday objects.

Training Set

Each sample in the training set includes:

An image containing a personal object.
A natural language description.

Name	Content	Examples	Size	Link
`HRI.zip`	Images from human-robot interaction	96	37.4 MBytes	Download
`HRI.json`	personal object descriptions (annotations). Keys are the image_ids in `HRI.zip` and Values consists of [{general indicator}, {persoanl indicator}]	96	8 KBytes	Download
`HRI.tsv`	preprocessed data for HRI. This consists of a image, a personal indicator, and the location of the object	96	50.3 MBytes	Download

Each element in HRI.json is as shown below.

"0.png": ["White bottle in front","my sleeping pills"]

Each element in HRI.tsv consists of a unique_id, image_id (do not use this), personal indicator, bounding box coordinates, image in string as shown below.

0	38.png	the flowers for my bedroom	252.41,314.63,351.07,418.89	iVBORw0KGgoAAA....

Reminiscence

The reminiscence consists of 400 raw images of the environment. This raw images can be utilized in learning process, but annotations CANNOT be used in GraspMine.

Name	Content	Examples	Size	Link
`Reminiscence.zip`	Unlabeled images of Reminiscence	400	129.4 MBytes	Download
`Reminiscence_nodes.zip`	Cropped object images of Reminiscence. All objects detected from the Object Detector are saved as a cropped image	8270	61 MBytes	Download
`R_object_features.json`	Visual features of cropped images. The features were extracted through DINO	8270	124 MBytes	Download
`Reminiscence_annotations.xlsx`	Annotations of Reminiscence nodes. Each personal indicators are annotated with the {image_id}_{object_id} in the above `Reminiscence_nodes.zip`	8270	4.4 MBytes	Download

Test Set

Each sample in the test set includes:

Images containing multiple objects.
A natural language personal indicator.
Associated object coordinates.

Name	Content	Examples	Size	Link	Description
`heterogeneous.zip`	Images of Heterogeneous split	60	19.1 MBytes	Download	Scenes with randomly selected objects
`homogeneous.zip`	Images of Homogeneous split	60	18.6 MBytes	Download	Scenes with similar-looking objects of the same category
`cluttered.zip`	Images of Cluttered split	106	36.6 MBytes	Download	highly cluttered objects. Sourced from the IM-Dial dataset
`heterogeneous.pth`	Annotations for Heterogeneous images	120	12 KBytes	Download
`homogeneous.pth`	Annotations for Homogeneous images	120	12 KBytes	Download
`cluttered.pth`	Annotations for Cluttered images	106	32 KBytes	Download
`paraphrased.pth`	Paraphrased annotations for all splits	346	49 KBytes	Download	Each personal indicator paraphrased by annotators

Each line in heterogeneous.pth, homogeneous.pth, cluttered.pth, paraphrased.pth is as shown below.

This will be provided soon. You can either check on your own by downloading the above links

Reminiscence Construction

For Reminiscence Construction, we leverage the pretrained classifiers and object detecto from Bottom-Up Attention for detecting every objects in the scene. The code is originated and modified from this repository.

We strongly recommend you to use a separate environment for the visual feature extraction. Please follow the Prerequisites here.

You can either detect objects by your own, but we provide the detected results (i.e., cropped object images) in Reminiscence_nodes.zip.

Object Information Acquisition

You need a physical robot to run this part

To initiate an interaction with the robot, position the personal object in front of it, alongside both the general and personal indicators, e.g., "the object in front is my sleeping pills". Utilizing the general indicator (the object placed in front), our system employs GVCCI to determine the location of the object. This process will automatically record crucial labels from the interaction, including:

an initial image
images of the robot-object interaction
personal indicator
and the object bounding box coordinate.

Download the GVCCI model, ENV2(135).

OMP_NUM_THREADS=4 CUDA_VISIBLE_DEVICES=0,1,2,3 python OIA_interaction.py

Upon completion, the following data is automatically saved:

Image Files: A set of {object_num}_{interaction_num}.png files are generated, capturing each interaction uniquely.
Dictionary: A dictionary is created, with keys represented as {object_num}, and corresponding values as lists containing lists of (general_indicator, personal_indicator).

OMP_NUM_THREADS=4 CUDA_VISIBLE_DEVICES=0,1,2,3 python OIA_postprocess.py --gvcci_path YOUR_GVCCI_PATH --save_path PATH_TO_SAVE --hri_path YOUR_HRI.json_PATH --cropped_img_path PATH_TO_SAVE_IMGS --raw_img_path YOUR_HRI.zip_PATH --xlsx_path YOUR_Reminiscence_annotations.xlsx.PATH

If you do not have the robot to perform, the results can be alternatively downloaded from here that contains the following information.

img_id: [personal indicator, bounding box coordinates]

Propagation through Reminiscence

Utilizing the information obtained from Object Information Acquisition, unlabelled images from the Reminiscence dataset are pseudo-labeled using the Propagation through Reminiscence. To execute this, run the following script:

CUDA_VISIBLE_DEVICES=0 python label_propagation.py --model 'vanilla' --thresh 0.55 --iter 3 --save_nodes True --sample_n 400 --ignore_interaction True --seed 777

The .pth file will be saved that consists of a list, each element representing each object node. Each object nodes are a dictionary tagged with following informations:

Items	Content
visual feature	512 dimension feature vector extracted from DINO
category	category of the object
label	personal indicator
img_id	Reminiscence image id
obj_id	object id
known	whether if the node is from OIA, `True` or `False`
labelled	whether if the node is labeled or not (including pseudo-labels), `True` or `False`

Personalized Object Grounding Model

Our Personalized Object Grounding Model is based on OFA, the state-of-the-art vision-and-language foundation model.

Process Data

You first need to post-process the training, test data for the Grounding model. By running the following scripts, you can acquire datasets in .tsv format.

python postprocess_all.py
python postprocess_size.py

Training

With the processed data that comprise of image, personal indicator, and object coordinate, you can train the grounding model with the following script:

cd run_scripts
nohup sh train.sh

The pre-trained checkpoints of PGA can be found below.

Baseline checkpoints

OFA	GVCCI	Direct	PassivePGA	PGA	Supervised
Download	Download	Download	Download	Download	Download

PGA checkpoints

0	25	100	400
Download	Download	Download	Download

Visualization

If you have the pretrained grounding model, you can visualize the prediction results with the following script:

python visualization.py

Evaluation

You can evaluate your model on the test sets with the following script:

python evaluation.py

for the demonstration (visualization of the inference results), run the following script:

python evaluation.py --demo

Personalized Object Grasping

If you want to reproduce the images of the testset and try on your own model, run the following script:

python online_experiment_server.py

Alongside, the code for the robot vg_client.py is provided.

If you just want to do the demonstration of the received image from the robot and your own query:

python online_experiment_server.py --demo

Experimental Results

We assessed the Personalized Grasping Agent (PGA) on our proposed dataset, \textsc{GraspMine}, benchmarking it against various baselines. The offline experiment measured PGA's efficacy in Personalized Object Grounding, how well PGA identifies an object given its natural language indicators. Meanwhile, the online experiment probed its real-world performance in Personalized Language-Conditioned Robotic Grasping (LCRG) using a robot arm.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
criterions		criterions
data		data
fairseq		fairseq
models		models
ofa_module		ofa_module
readme_figures		readme_figures
run_scripts		run_scripts
tasks		tasks
utils		utils
OIA_interaction.py		OIA_interaction.py
OIA_postprocess.py		OIA_postprocess.py
README.md		README.md
evaluation.py		evaluation.py
label_propagation.py		label_propagation.py
online_experiment_server.py		online_experiment_server.py
postprocess_all.py		postprocess_all.py
postprocess_size.py		postprocess_size.py
requirements.txt		requirements.txt
train_vg.py		train_vg.py
trainer.py		trainer.py
visualization.py		visualization.py

JHKim-snu/PGA

Folders and files

Latest commit

History

Repository files navigation

PGA: Personalizing Grasping Agents with Single Human-Robot Interaction

arXiv | Poster (TBD) | Presentation Video (TBD) | 1 min Demo Video

Overview

Citation

Table of Contents

Environment Setup

GraspMine Dataset

Training Set

Reminiscence

Test Set

Reminiscence Construction

Object Information Acquisition

Propagation through Reminiscence

Personalized Object Grounding Model

Process Data

Training

Visualization

Evaluation

Personalized Object Grasping

Experimental Results

Offline Experiments (Localization)

Online Experiments (Real-world)

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Languages