调用calculon
PYTHONPATH=LLMSIM/calculon/ LLMSIM/calculon/bin/calculon -h
PYTHONPATH=LLMSIM/calculon/ LLMSIM/calculon/bin/calculon llm LLMSIM/calculon/models/megatron-1T.json LLMSIM/calculon/examples/3072_t4_p64_d12_mbs4_full.json LLMSIM/calculon/systems/a100_80g.json -
PYTHONPATH=LLMSIM/calculon/ LLMSIM/calculon/bin/calculon llm-optimal-execution LLMSIM/calculon/models/turing-530B.json 5128 2520 float16 LLMSIM/calculon/systems/a100_80g.json LLMSIM/output/output.json -m
PYTHONPATH=LLMSIM/calculon/ LLMSIM/calculon/bin/calculon llm-all-executions LLMSIM/calculon/models/megatron-1T.json 10256 5040 float16 LLMSIM/calculon/systems/a100_80g.json LLMSIM/output/all-megatron-1T.csv
# Part of parameters are fixed
PYTHONPATH=LLMSIM/calculon/ LLMSIM/calculon/bin/calculon llm-optimal-execution LLMSIM/calculon/models/megatron-1T.json 16384 4096 float16 LLMSIM/calculon/systems/a100_80g.json LLMSIM/output/output-fixed.json -m --fixed --execution LLMSIM/calculon/examples/1T_tx_px_dx_mbs1_full.json运行llmsim
python LLMSIM/llmsim.py 16384 4096 LLMSIM/calculon/models/megatron-1T.json LLMSIM/calculon/systems/a100_80g.json- 搞清楚calculon各个求解变量的含义与取值范围
- 完成SBO-LLMSIM代码框架
- 梳理calculon参数集合生成的逻辑(目前存在找不到合法配置的情况)
- 理清变量间的约束关系,集成在SBO-LLMSIM的计算函数中完成测试
- 修改calculon中不合理的遍历逻辑,重新对比试验
- Failed问题能不能加在约束里面
目前存在scipy的minimize函数优化失败的问题,报错TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'或者double free or corruption (!prev),可能需要想办法减少约束或者改变约束的形式 ==> 更换或者修改calculon框架