-
Notifications
You must be signed in to change notification settings - Fork 129
Closed
Description
Reproduce
- Turn on the
NS3_SANITIZEhttps://github.com/aliyun/ns-3-alibabacloud/blob/master/simulation/CMakeLists.txt#L61 - Run simulation as normal
Logs
maxRtt=4720 maxBdp=236000
Running Simulation.
The final active chunks per dimension 1 after allocating to queues is: 1
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
total nodes: 144
Success in opening workload file
model_parallel_NPU_group: is: 8
checkpoints layers are:
layers initiating fwd_in_bckwd are:
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
id: embedding_layer , depen: -1 , wg_comp_time: 1
type: HYBRID_TRANSFORMER_FWD_IN_BCKWD ,num passes: 1 ,lines: 1 compute scale: 1 ,comm scale: 1
stat path: ./ncclFlowModel_ ,total rows: 1 ,stat row: 0
CSV path and filename: ./ncclFlowModel_detailed_144.csv
CSV path and filename: ./ncclFlowModel_EndToEnd_144.csv
=================================================================
==9941==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000fd2f74 at pc 0x7f475725362f bp 0x7fff94b9a270 sp 0x7fff94b9a260
READ of size 4 at 0x602000fd2f74 thread T0
#0 0x7f475725362e in MockNccl::MockNcclGroup::InterDouBinTreeShift(MockNccl::MockNcclGroup::DoubleBinaryTreeNode*, std::vector<int, std::allocator<int> >) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2038
#1 0x7f475725200b in MockNccl::MockNcclGroup::genInterDouBinTree(MockNccl::GroupInfo) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2000
#2 0x7f475724e5e3 in MockNccl::MockNcclGroup::gettreechannels(int, MockNccl::GroupType) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:1893
#3 0x7f47571cf384 in MockNccl::MockNcclComm::MockNcclComm(int, MockNccl::GroupType, MockNccl::MockNcclGroup*) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclChannel.cc:22
#4 0x7f475738f260 in AstraSim::Sys::mock_nccl_comms_init() /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:1411
#5 0x7f4757363d59 in AstraSim::Sys::Sys(AstraSim::AstraNetworkAPI*, AstraSim::AstraMemoryAPI*, int, int, int, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, float, float, int, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, GPUType, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, int) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:297
#6 0x5562830980ce in main /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/scratch/AstraSimNetwork.cc:311
#7 0x7f473bda2d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
#8 0x7f473bda2e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
#9 0x556283050384 in _start (/root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/build/scratch/ns3.36.1-AstraSimNetwork-debug+0x1d3384)
0x602000fd2f74 is located 0 bytes to the right of 4-byte region [0x602000fd2f70,0x602000fd2f74)
allocated by thread T0 here:
#0 0x7f47694b51e7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
#1 0x55628316e51c in __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127
#2 0x556283156623 in std::allocator_traits<std::allocator<int> >::allocate(std::allocator<int>&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464
#3 0x556283125b33 in std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) /usr/include/c++/11/bits/stl_vector.h:346
#4 0x5562830fc49b in std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) /usr/include/c++/11/bits/stl_vector.h:361
#5 0x5562830d302a in std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) /usr/include/c++/11/bits/stl_vector.h:305
#6 0x5562830affda in std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) /usr/include/c++/11/bits/stl_vector.h:555
#7 0x7f4757251f96 in MockNccl::MockNcclGroup::genInterDouBinTree(MockNccl::GroupInfo) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2000
#8 0x7f475724e5e3 in MockNccl::MockNcclGroup::gettreechannels(int, MockNccl::GroupType) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:1893
#9 0x7f47571cf384 in MockNccl::MockNcclComm::MockNcclComm(int, MockNccl::GroupType, MockNccl::MockNcclGroup*) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclChannel.cc:22
#10 0x7f475738f260 in AstraSim::Sys::mock_nccl_comms_init() /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:1411
#11 0x7f4757363d59 in AstraSim::Sys::Sys(AstraSim::AstraNetworkAPI*, AstraSim::AstraMemoryAPI*, int, int, int, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, float, float, int, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, GPUType, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, int) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:297
#12 0x5562830980ce in main /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/scratch/AstraSimNetwork.cc:311
#13 0x7f473bda2d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
SUMMARY: AddressSanitizer: heap-buffer-overflow /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2038 in MockNccl::MockNcclGroup::InterDouBinTreeShift(MockNccl::MockNcclGroup::DoubleBinaryTreeNode*, std::vector<int, std::allocator<int> >)
Shadow bytes around the buggy address:
0x0c04801f2590: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fa
0x0c04801f25a0: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fd
0x0c04801f25b0: fa fa fd fa fa fa fd fa fa fa fd fd fa fa fd fa
0x0c04801f25c0: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fa
0x0c04801f25d0: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fd
=>0x0c04801f25e0: fa fa 04 fa fa fa 04 fa fa fa 00 fa fa fa[04]fa
0x0c04801f25f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c04801f2600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c04801f2610: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c04801f2620: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c04801f2630: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==9941==ABORTING
Metadata
Metadata
Assignees
Labels
No labels