In the current neura dialect, we assume the kernel is always with loop-related control to ensure a secure start or exit.
However, we may also need to handle the following case:
affine.for (<start, step, bound>) {
neura.kernel(<ins>) {
<loop body>
}
}
where <loop body> is pure computation ops, it is triggered by the outer counter.