Optional param not supported, need many empty tensor as input. which may bring extra overhead when no cudagraph