Skip to content

Conversation

@CongHoanCoder
Copy link
Contributor

I make a few changes from your code. Please help me review it. Thanks

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being deleted entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's file is too heavy to commit so I shifted it to LFS. but I didn't change this file. so just reject this change

outputs/train/aloha_sim_transfer_cube_human/model_10000.safetensors filter=lfs diff=lfs merge=lfs -text
resnet18-f37072fd.safetensors filter=lfs diff=lfs merge=lfs -text
outputs/train/act_aloha_sim_transfer_cube_human/model_10000.safetensors filter=lfs diff=lfs merge=lfs -text
*.ipynb filter=lfs diff=lfs merge=lfs -text
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being added? Are you saying you want to keep .iypnb files as lfs hosts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good - agreed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we need to work on the naming. No spaces in the names. I'd create a directory called tests/ and put test.py under there

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you deleting this file entirely, or are you shifting it to LFS? If so, where's the LFS upload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's file is too heavy to commit so I shifted it to LFS. but I didn't change this file. so just reject this change

import tinygrad
from tinygrad import Tensor, nn, dtypes
from tinygrad.ops import Variable
# from tinygrad.ops import Variable
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No dangling comments. Why is this commented out? What shifted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has been shifted to "from tinygrad import Variable"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's that import? Is it needed now?

Comment on lines +55 to +62
mean_val = stats[key]["mean"]
std_val = stats[key]["std"]
# Convert to numpy if needed (handle both numpy arrays and torch tensors)
if hasattr(mean_val, 'numpy'):
mean_val = mean_val.numpy()
if hasattr(std_val, 'numpy'):
std_val = std_val.numpy()
buffer["mean"].assign(mean_val)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay, but it's just verbose. Is there any way to make this into a one or two liner?

Comment on lines +67 to +74
min_val = stats[key]["min"]
max_val = stats[key]["max"]
# Convert to numpy if needed (handle both numpy arrays and torch tensors)
if hasattr(min_val, 'numpy'):
min_val = min_val.numpy()
if hasattr(max_val, 'numpy'):
max_val = max_val.numpy()
buffer["min"].assign(min_val)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Read above)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I back up your result and trained it again to compare results. You can reject the change.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I back up your result and trained it again to compare results. You can reject the change.

import pathlib
resnet18_IMAGENET1K = ResNet(Block, [2, 2, 2, 2], num_classes=1000)
state_dict = nn.state.safe_load("resnet18-f37072fd.safetensors")
model_path = pathlib.Path(__file__).parent / "resnet18-f37072fd.safetensors"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good

Comment on lines +19 to +23
# parser.add_argument("--env_name", type=str, choices=['AlohaTransferCube-v0', 'AlohaInsertion-v0'], default='AlohaTransferCube-v0')
parser.add_argument("--env_name", type=str, choices=['AlohaTransferCube-v0', 'AlohaInsertion-v0'], default='AlohaInsertion-v0')
# parser.add_argument("--model_path", type=str, default='outputs/train/aloha_sim_transfer_cube_human/model_final.safetensors')
# parser.add_argument("--model_path", type=str, default='outputs/train/aloha_sim_transfer_cube_human/model_final.safetensors')
parser.add_argument("--model_path", type=str, default='outputs/train/aloha_sim_insertion_human/model_30000_original.safetensors')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No dangling comments. Also, why would we peg the model to open to 30,000 instead of final?

Copy link
Contributor Author

@CongHoanCoder CongHoanCoder Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the original version, there was only the model with 30000 steps. so I just run the train.py file again to get the models and compared the results. both 30000 steps and the final models are not good now. you can reject the change and help me to compare the results

fps = env.metadata["render_fps"]

# Encode all frames into a mp4 video.
video_path = output_directory / "rollout.mp4"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why rollout3 vs rollout?

Copy link
Contributor Author

@CongHoanCoder CongHoanCoder Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is just for testing to compare your result before

Comment on lines +3 to +5
# from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

from lerobot.datasets.lerobot_dataset import LeRobotDataset
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know it shifted.

Again, no dangling comments

# Start of training code
parser=argparse.ArgumentParser(description="Argument Parser for ACT training on simulated environments")
parser.add_argument("env_name", type=str, choices=['aloha_sim_transfer_cube_human', 'aloha_sim_insertion_human'], default='aloha_sim_insertion_human')
parser.add_argument("--env_name", type=str, choices=['aloha_sim_transfer_cube_human', 'aloha_sim_insertion_human'], default='aloha_sim_insertion_human')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep solid

Comment on lines +95 to +104
########################################################################
# Handle unused parameters by assigning zero gradients
optimizers_list = [opt]
if cfg.train_backbone_separately:
optimizers_list.append(opt_backbone)
for optimizer in optimizers_list:
for param in optimizer.params:
if param.grad is None:
# Create zero gradient with same shape and device as parameter
param.grad = Tensor.zeros(*param.shape, device=param.device, requires_grad=False)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain this a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some parameters not have gradients so when I run it, I got the error as the attachment. I searched and it said that the issue is that opt.step() is being called without first ensuring all parameters have gradients. The problem is that not all parameters are used in the forward pass, so they don't get gradients. then I changed the code like this

train error

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it - okay!

Comment on lines +138 to +145
batch_converted = {}
for k, v in batch.items():
if isinstance(v, torch.Tensor):
batch_converted[k] = Tensor(v.detach().cpu().numpy(), requires_grad=False)
else:
batch_converted[k] = v # Keep strings, lists, etc. as-is

batch = batch_converted
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this makes sense in theory. But why are you mixing Torch and tinygrad tensors? Where's it coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the original code, I have encountered with the error below. The issue was that the batch contains mixed data types - some are torch tensors and some are lists or other non-tensor types. so I changed it.

image

@mdaiter
Copy link
Owner

mdaiter commented Jan 21, 2026

cool - @CongHoanCoder , just clean up the pull request (e.g. no commented out imports / code, that's sloppy. also no randomly deleted files). I can check again and validate on my side when you're ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants