Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
Copyright (c) 2019, Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Copyright (c) 2019, Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
137 changes: 73 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,64 +1,73 @@
This repository contains the implementation of:
Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. [Paper](https://arxiv.org/pdf/1904.12659.pdf)

![image](https://github.com/limaosen0/AS-GCN/blob/master/img/pipeline.png)

Abstract: Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higherorder dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction.

In this repo, we show the example of model on NTU-RGB+D dataset.

# Experiment Requirement
* Python 3.6
* Pytorch 0.4.1
* pyyaml
* argparse
* numpy

# Environments
We use the similar input/output interface and system configuration like ST-GCN, where the torchlight module should be set up.

Run
```
cd torchlight, python setup.py, cd ..
```


# Data Preparing
For NTU-RGB+D dataset, you can download it from [NTU-RGB+D](http://rose1.ntu.edu.sg/datasets/actionrecognition.asp). And put the dataset in the file path:
```
'./data/NTU-RGB+D/nturgb+d_skeletons/'
```
Then, run the preprocessing program to generate the input data, which is very important.
```
python ./data_gen/ntu_gen_preprocess.py
```

# Training and Testing
With this repo, you can pretrain AIM and save the module at first; then run the code to train the main pipleline of AS-GCN. For the recommended benchmark of Cross-Subject in NTU-RGB+D,
```
PretrainAIM: python main.py recognition -c config/as_gcn/ntu-xsub/train_aim.yaml
TrainMainPipeline: python main.py recognition -c config/as_gcn/ntu-xsub/train.yaml
Test: python main.py recognition -c config/as_gcn/ntu-xsub/test.yaml
```

For Cross-View,
```
PretrainAIM: python main.py recognition -c config/as_gcn/ntu-xsub/train_aim.yaml
TrainMainPipeline: python main.py recognition -c config/as_gcn/ntu-xsub/train.yaml
Test: python main.py recognition -c config/as_gcn/ntu-xsub/test.yaml
```

# Acknowledgement
Thanks for the framework provided by 'yysijie/st-gcn', which is source code of the published work [ST-GCN](https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135) in AAAI-2018. The github repo is [ST-GCN code](https://github.com/yysijie/st-gcn). We borrow the framework and interface from the code.

# Citation
If you use this code, please cite our paper:
```
@InProceedings{Li_2019_CVPR,
author = {Li, Maosen and Chen, Siheng and Chen, Xu and Zhang, Ya and Wang, Yanfeng and Tian, Qi},
title = {Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
```
This repository contains the implementation of:
Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. [Paper](https://arxiv.org/pdf/1904.12659.pdf)

![image](https://github.com/limaosen0/AS-GCN/blob/master/img/pipeline.png)

Abstract: Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higherorder dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction.

In this repo, we show the example of model on NTU-RGB+D dataset.

# Experiment Requirement
* Python 3.6
* Pytorch 0.4.1
* pyyaml
* argparse
* numpy
* torch 1.7.1

# Environments
We use the similar input/output interface and system configuration like ST-GCN, where the torchlight module should be set up.
```
cd torchlight
cp torchlight/torchlight/_init__.py gpu.py io.py ../
```
change all "from torchlight import ..." to
"from torchlight.io import ..."

Run
```
cd torchlight, python setup.py install, cd ..
```


# Data Preparing
For NTU-RGB+D dataset, you can download it from [NTU-RGB+D](http://rose1.ntu.edu.sg/datasets/actionrecognition.asp). And put the dataset in the file path:
```
'./data/NTU-RGB+D/nturgb+d_skeletons/'
```
Then, run the preprocessing program to generate the input data, which is very important.
```
cd data_gen
python ntu_gen_preprocess.py
```

# Training and Testing
With this repo, you can pretrain AIM and save the module at first; then run the code to train the main pipleline of AS-GCN. For the recommended benchmark of Cross-Subject in NTU-RGB+D,
```
PretrainAIM: python main.py recognition -c config/as_gcn/ntu-xsub/train_aim.yaml --device 0 1 2
TrainMainPipeline: python main.py recognition -c config/as_gcn/ntu-xsub/train.yaml --device 0 --batch_size 4
# only can use one gpu otherwise got the error "Caught RuntimeError in replica 0 on device 0""
Test: python main.py recognition -c config/as_gcn/ntu-xsub/test.yaml
```

For Cross-View,
```
PretrainAIM: python main.py recognition -c config/as_gcn/ntu-xview/train_aim.yaml
TrainMainPipeline: python main.py recognition -c config/as_gcn/ntu-xview/train.yaml
Test: python main.py recognition -c config/as_gcn/ntu-xview/test.yaml
```

# Acknowledgement
Thanks for the framework provided by 'yysijie/st-gcn', which is source code of the published work [ST-GCN](https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135) in AAAI-2018. The github repo is [ST-GCN code](https://github.com/yysijie/st-gcn). We borrow the framework and interface from the code.

# Citation
If you use this code, please cite our paper:
```
@InProceedings{Li_2019_CVPR,
author = {Li, Maosen and Chen, Siheng and Chen, Xu and Zhang, Ya and Wang, Yanfeng and Tian, Qi},
title = {Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
```
2 changes: 1 addition & 1 deletion config/as_gcn/ntu-xsub/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@

96 changes: 48 additions & 48 deletions config/as_gcn/ntu-xsub/test.yaml
Original file line number Diff line number Diff line change
@@ -1,48 +1,48 @@
work_dir: ./work_dir/recognition/ntu-xsub/AS_GCN
weights1: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch99_model1.pt
weights2: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch99_model2.pt

feeder: feeder.feeder.Feeder
train_feeder_args:
data_path: ./data/nturgb_d/xsub/train_data_joint_pad.npy
label_path: ./data/nturgb_d/xsub/train_label.pkl
random_move: True
repeat_pad: True
down_sample: True
test_feeder_args:
data_path: ./data/nturgb_d/xsub/val_data_joint_pad.npy
label_path: ./data/nturgb_d/xsub/val_label.pkl
random_move: False
repeat_pad: True
down_sample: True

model1: net.as_gcn.Model
model1_args:
in_channels: 3
num_class: 60
dropout: 0.5
edge_importance_weighting: True
graph_args:
layout: 'ntu-rgb+d'
strategy: 'spatial'
max_hop: 4

model2: net.utils.adj_learn.AdjacencyLearn
model2_args:
n_in_enc: 150
n_hid_enc: 128
edge_types: 3
n_in_dec: 3
n_hid_dec: 128
node_num: 25

device: [0,1,2,3]
batch_size: 32
test_batch_size: 32
num_worker: 4

max_hop_dir: max_hop_4
lamda_act_dir: lamda_05
lamda_act: 0.5

phase: test
work_dir: ./work_dir/recognition/ntu-xsub/AS_GCN
weights1: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch99_model1.pt
weights2: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch99_model2.pt
feeder: feeder.feeder.Feeder
train_feeder_args:
data_path: ./data/nturgb_d/xsub/train_data_joint_pad.npy
label_path: ./data/nturgb_d/xsub/train_label.pkl
random_move: True
repeat_pad: True
down_sample: True
test_feeder_args:
data_path: ./data/nturgb_d/xsub/val_data_joint_pad.npy
label_path: ./data/nturgb_d/xsub/val_label.pkl
random_move: False
repeat_pad: True
down_sample: True
model1: net.as_gcn.Model
model1_args:
in_channels: 3
num_class: 60
dropout: 0.5
edge_importance_weighting: True
graph_args:
layout: 'ntu-rgb+d'
strategy: 'spatial'
max_hop: 4
model2: net.utils.adj_learn.AdjacencyLearn
model2_args:
n_in_enc: 150
n_hid_enc: 128
edge_types: 3
n_in_dec: 3
n_hid_dec: 128
node_num: 25
device: [0,1,2,3]
batch_size: 32
test_batch_size: 32
num_worker: 4
max_hop_dir: max_hop_4
lamda_act_dir: lamda_05
lamda_act: 0.5
phase: test
108 changes: 54 additions & 54 deletions config/as_gcn/ntu-xsub/train.yaml
Original file line number Diff line number Diff line change
@@ -1,54 +1,54 @@
work_dir: ./work_dir/recognition/ntu-xsub/AS_GCN

weights1: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch9_model1.pt
weights2: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch9_model2.pt

feeder: feeder.feeder.Feeder
train_feeder_args:
data_path: ./data/nturgb_d/xsub/train_data_joint_pad.npy
label_path: ./data/nturgb_d/xsub/train_label.pkl
random_move: True
repeat_pad: True
down_sample: True
test_feeder_args:
data_path: ./data/nturgb_d/xsub/val_data_joint_pad.npy
label_path: ./data/nturgb_d/xsub/val_label.pkl
random_move: False
repeat_pad: True
down_sample: True

model1: net.as_gcn.Model
model1_args:
in_channels: 3
num_class: 60
dropout: 0.5
edge_importance_weighting: True
graph_args:
layout: 'ntu-rgb+d'
strategy: 'spatial'
max_hop: 4

model2: net.utils.adj_learn.AdjacencyLearn
model2_args:
n_in_enc: 150
n_hid_enc: 128
edge_types: 3
n_in_dec: 3
n_hid_dec: 128
node_num: 25

weight_decay: 0.0001
base_lr1: 0.1
base_lr2: 0.0005
step: [50, 70, 90]

device: [0,1,2,3]
batch_size: 32
test_batch_size: 32
start_epoch: 10
num_epoch: 100
num_worker: 4

max_hop_dir: max_hop_4
lamda_act_dir: lamda_05
lamda_act: 0.5
work_dir: ./work_dir/recognition/ntu-xsub/AS_GCN
weights1: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch9_model1.pt
weights2: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch9_model2.pt
feeder: feeder.feeder.Feeder
train_feeder_args:
data_path: ./data/nturgb_d/xsub/train_data_joint_pad.npy
label_path: ./data/nturgb_d/xsub/train_label.pkl
random_move: True
repeat_pad: True
down_sample: True
test_feeder_args:
data_path: ./data/nturgb_d/xsub/val_data_joint_pad.npy
label_path: ./data/nturgb_d/xsub/val_label.pkl
random_move: False
repeat_pad: True
down_sample: True
model1: net.as_gcn.Model
model1_args:
in_channels: 3
num_class: 60
dropout: 0.5
edge_importance_weighting: True
graph_args:
layout: 'ntu-rgb+d'
strategy: 'spatial'
max_hop: 4
model2: net.utils.adj_learn.AdjacencyLearn
model2_args:
n_in_enc: 150
n_hid_enc: 128
edge_types: 3
n_in_dec: 3
n_hid_dec: 128
node_num: 25
weight_decay: 0.0001
base_lr1: 0.0076
base_lr2: 0.0005
step: [50, 70, 90]
device: [0,1,2,3]
batch_size: 32
test_batch_size: 32
start_epoch: 10
num_epoch: 100
num_worker: 4
max_hop_dir: max_hop_4
lamda_act_dir: lamda_05
lamda_act: 0.5
Loading