limaosen0 · 1suancaiyu · May 7, 2021 · May 7, 2021 · May 8, 2021 · Jun 21, 2021
diff --git a/LICENSE b/LICENSE
@@ -1,23 +1,23 @@
-Copyright (c) 2019, Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
-
-* Redistributions of source code must retain the above copyright notice, this
-  list of conditions and the following disclaimer.
-
-* Redistributions in binary form must reproduce the above copyright notice,
-  this list of conditions and the following disclaimer in the documentation
-  and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+Copyright (c) 2019, Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/README.md b/README.md
@@ -1,64 +1,73 @@
-This repository contains the implementation of:
-Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. [Paper](https://arxiv.org/pdf/1904.12659.pdf)
-
-![image](https://github.com/limaosen0/AS-GCN/blob/master/img/pipeline.png)
-
-Abstract: Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higherorder dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction.
-
-In this repo, we show the example of model on NTU-RGB+D dataset.
-
-# Experiment Requirement
-* Python 3.6
-* Pytorch 0.4.1
-* pyyaml
-* argparse
-* numpy
-
-# Environments
-We use the similar input/output interface and system configuration like ST-GCN, where the torchlight module should be set up.
-
-Run
-```
-cd torchlight, python setup.py, cd ..
-```
-
-
-# Data Preparing
-For NTU-RGB+D dataset, you can download it from [NTU-RGB+D](http://rose1.ntu.edu.sg/datasets/actionrecognition.asp). And put the dataset in the file path:
-```
-'./data/NTU-RGB+D/nturgb+d_skeletons/'
-```
-Then, run the preprocessing program to generate the input data, which is very important.
-```
-python ./data_gen/ntu_gen_preprocess.py
-```
-
-# Training and Testing
-With this repo, you can pretrain AIM and save the module at first; then run the code to train the main pipleline of AS-GCN. For the recommended benchmark of Cross-Subject in NTU-RGB+D,
-```
-PretrainAIM: python main.py recognition -c config/as_gcn/ntu-xsub/train_aim.yaml
-TrainMainPipeline: python main.py recognition -c config/as_gcn/ntu-xsub/train.yaml
-Test: python main.py recognition -c config/as_gcn/ntu-xsub/test.yaml
-```
-
-For Cross-View,
-```
-PretrainAIM: python main.py recognition -c config/as_gcn/ntu-xsub/train_aim.yaml
-TrainMainPipeline: python main.py recognition -c config/as_gcn/ntu-xsub/train.yaml
-Test: python main.py recognition -c config/as_gcn/ntu-xsub/test.yaml
-```
-
-# Acknowledgement
-Thanks for the framework provided by 'yysijie/st-gcn', which is source code of the published work [ST-GCN](https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135) in AAAI-2018. The github repo is [ST-GCN code](https://github.com/yysijie/st-gcn). We borrow the framework and interface from the code.
-
-# Citation
-If you use this code, please cite our paper:
-```
-@InProceedings{Li_2019_CVPR,
-author = {Li, Maosen and Chen, Siheng and Chen, Xu and Zhang, Ya and Wang, Yanfeng and Tian, Qi},
-title = {Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition},
-booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
-month = {June},
-year = {2019}
-}
-```
+This repository contains the implementation of:
+Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. [Paper](https://arxiv.org/pdf/1904.12659.pdf)
+
+![image](https://github.com/limaosen0/AS-GCN/blob/master/img/pipeline.png)
+
+Abstract: Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higherorder dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction.
+
+In this repo, we show the example of model on NTU-RGB+D dataset.
+
+# Experiment Requirement
+* Python 3.6
+* Pytorch 0.4.1
+* pyyaml
+* argparse
+* numpy
+* torch 1.7.1
+
+# Environments
+We use the similar input/output interface and system configuration like ST-GCN, where the torchlight module should be set up.
+```
+cd torchlight
+cp torchlight/torchlight/_init__.py  gpu.py  io.py ../
+```
+change all "from torchlight import ..." to
+"from torchlight.io import ..."
+
+Run
+```
+cd torchlight, python setup.py install, cd ..
+```
+
+
+# Data Preparing
+For NTU-RGB+D dataset, you can download it from [NTU-RGB+D](http://rose1.ntu.edu.sg/datasets/actionrecognition.asp). And put the dataset in the file path:
+```
+'./data/NTU-RGB+D/nturgb+d_skeletons/'
+```
+Then, run the preprocessing program to generate the input data, which is very important.
+```
+cd data_gen
+python ntu_gen_preprocess.py
+```
+
+# Training and Testing
+With this repo, you can pretrain AIM and save the module at first; then run the code to train the main pipleline of AS-GCN. For the recommended benchmark of Cross-Subject in NTU-RGB+D,
+```
+PretrainAIM: python main.py recognition -c config/as_gcn/ntu-xsub/train_aim.yaml --device 0 1 2
+TrainMainPipeline: python main.py recognition -c config/as_gcn/ntu-xsub/train.yaml --device 0 --batch_size 4
+# only can use one gpu otherwise got the error "Caught RuntimeError in replica 0 on device 0""
+Test: python main.py recognition -c config/as_gcn/ntu-xsub/test.yaml
+```
+
+For Cross-View,
+```
+PretrainAIM: python main.py recognition -c config/as_gcn/ntu-xview/train_aim.yaml
+TrainMainPipeline: python main.py recognition -c config/as_gcn/ntu-xview/train.yaml
+Test: python main.py recognition -c config/as_gcn/ntu-xview/test.yaml
+```
+
+# Acknowledgement
+Thanks for the framework provided by 'yysijie/st-gcn', which is source code of the published work [ST-GCN](https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135) in AAAI-2018. The github repo is [ST-GCN code](https://github.com/yysijie/st-gcn). We borrow the framework and interface from the code.
+
+# Citation
+If you use this code, please cite our paper:
+```
+@InProceedings{Li_2019_CVPR,
+author = {Li, Maosen and Chen, Siheng and Chen, Xu and Zhang, Ya and Wang, Yanfeng and Tian, Qi},
+title = {Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition},
+booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+month = {June},
+year = {2019}
+}
+```
diff --git a/config/as_gcn/ntu-xsub/__init__.py b/config/as_gcn/ntu-xsub/__init__.py
@@ -1 +1 @@
-
+
diff --git a/config/as_gcn/ntu-xsub/test.yaml b/config/as_gcn/ntu-xsub/test.yaml
@@ -1,48 +1,48 @@
-work_dir: ./work_dir/recognition/ntu-xsub/AS_GCN
-weights1: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch99_model1.pt
-weights2: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch99_model2.pt
-
-feeder: feeder.feeder.Feeder
-train_feeder_args:
-  data_path: ./data/nturgb_d/xsub/train_data_joint_pad.npy
-  label_path: ./data/nturgb_d/xsub/train_label.pkl
-  random_move: True
-  repeat_pad: True
-  down_sample: True
-test_feeder_args:
-  data_path: ./data/nturgb_d/xsub/val_data_joint_pad.npy
-  label_path: ./data/nturgb_d/xsub/val_label.pkl
-  random_move: False
-  repeat_pad: True
-  down_sample: True
-
-model1: net.as_gcn.Model
-model1_args:
-  in_channels: 3
-  num_class: 60
-  dropout: 0.5
-  edge_importance_weighting: True
-  graph_args:
-    layout: 'ntu-rgb+d'
-    strategy: 'spatial'
-    max_hop: 4
-
-model2: net.utils.adj_learn.AdjacencyLearn
-model2_args:
-  n_in_enc: 150
-  n_hid_enc: 128
-  edge_types: 3
-  n_in_dec: 3
-  n_hid_dec: 128
-  node_num: 25
-
-device: [0,1,2,3]
-batch_size: 32
-test_batch_size: 32
-num_worker: 4
-
-max_hop_dir: max_hop_4
-lamda_act_dir: lamda_05
-lamda_act: 0.5
-
-phase: test
+work_dir: ./work_dir/recognition/ntu-xsub/AS_GCN
+weights1: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch99_model1.pt
+weights2: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch99_model2.pt
+
+feeder: feeder.feeder.Feeder
+train_feeder_args:
+  data_path: ./data/nturgb_d/xsub/train_data_joint_pad.npy
+  label_path: ./data/nturgb_d/xsub/train_label.pkl
+  random_move: True
+  repeat_pad: True
+  down_sample: True
+test_feeder_args:
+  data_path: ./data/nturgb_d/xsub/val_data_joint_pad.npy
+  label_path: ./data/nturgb_d/xsub/val_label.pkl
+  random_move: False
+  repeat_pad: True
+  down_sample: True
+
+model1: net.as_gcn.Model
+model1_args:
+  in_channels: 3
+  num_class: 60
+  dropout: 0.5
+  edge_importance_weighting: True
+  graph_args:
+    layout: 'ntu-rgb+d'
+    strategy: 'spatial'
+    max_hop: 4
+
+model2: net.utils.adj_learn.AdjacencyLearn
+model2_args:
+  n_in_enc: 150
+  n_hid_enc: 128
+  edge_types: 3
+  n_in_dec: 3
+  n_hid_dec: 128
+  node_num: 25
+
+device: [0,1,2,3]
+batch_size: 32
+test_batch_size: 32
+num_worker: 4
+
+max_hop_dir: max_hop_4
+lamda_act_dir: lamda_05
+lamda_act: 0.5
+
+phase: test
diff --git a/config/as_gcn/ntu-xsub/train.yaml b/config/as_gcn/ntu-xsub/train.yaml
@@ -1,54 +1,54 @@
-work_dir: ./work_dir/recognition/ntu-xsub/AS_GCN
-
-weights1: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch9_model1.pt
-weights2: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch9_model2.pt
-
-feeder: feeder.feeder.Feeder
-train_feeder_args:
-  data_path: ./data/nturgb_d/xsub/train_data_joint_pad.npy
-  label_path: ./data/nturgb_d/xsub/train_label.pkl
-  random_move: True
-  repeat_pad: True
-  down_sample: True
-test_feeder_args:
-  data_path: ./data/nturgb_d/xsub/val_data_joint_pad.npy
-  label_path: ./data/nturgb_d/xsub/val_label.pkl
-  random_move: False
-  repeat_pad: True
-  down_sample: True
-
-model1: net.as_gcn.Model
-model1_args:
-  in_channels: 3
-  num_class: 60
-  dropout: 0.5
-  edge_importance_weighting: True
-  graph_args:
-    layout: 'ntu-rgb+d'
-    strategy: 'spatial'
-    max_hop: 4
-
-model2: net.utils.adj_learn.AdjacencyLearn
-model2_args:
-  n_in_enc: 150
-  n_hid_enc: 128
-  edge_types: 3
-  n_in_dec: 3
-  n_hid_dec: 128
-  node_num: 25
-
-weight_decay: 0.0001
-base_lr1: 0.1
-base_lr2: 0.0005
-step: [50, 70, 90]
-
-device: [0,1,2,3]
-batch_size: 32
-test_batch_size: 32
-start_epoch: 10
-num_epoch: 100
-num_worker: 4
-
-max_hop_dir: max_hop_4
-lamda_act_dir: lamda_05
-lamda_act: 0.5
+work_dir: ./work_dir/recognition/ntu-xsub/AS_GCN
+
+weights1: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch9_model1.pt
+weights2: ./work_dir/recognition/ntu-xsub/AS_GCN/max_hop_4/lamda_05/epoch9_model2.pt
+
+feeder: feeder.feeder.Feeder
+train_feeder_args:
+  data_path: ./data/nturgb_d/xsub/train_data_joint_pad.npy
+  label_path: ./data/nturgb_d/xsub/train_label.pkl
+  random_move: True
+  repeat_pad: True
+  down_sample: True
+test_feeder_args:
+  data_path: ./data/nturgb_d/xsub/val_data_joint_pad.npy
+  label_path: ./data/nturgb_d/xsub/val_label.pkl
+  random_move: False
+  repeat_pad: True
+  down_sample: True
+
+model1: net.as_gcn.Model
+model1_args:
+  in_channels: 3
+  num_class: 60
+  dropout: 0.5
+  edge_importance_weighting: True
+  graph_args:
+    layout: 'ntu-rgb+d'
+    strategy: 'spatial'
+    max_hop: 4
+
+model2: net.utils.adj_learn.AdjacencyLearn
+model2_args:
+  n_in_enc: 150
+  n_hid_enc: 128
+  edge_types: 3
+  n_in_dec: 3
+  n_hid_dec: 128
+  node_num: 25
+
+weight_decay: 0.0001
+base_lr1: 0.0076
+base_lr2: 0.0005
+step: [50, 70, 90]
+
+device: [0,1,2,3]
+batch_size: 32
+test_batch_size: 32
+start_epoch: 10
+num_epoch: 100
+num_worker: 4
+
+max_hop_dir: max_hop_4
+lamda_act_dir: lamda_05
+lamda_act: 0.5