-
Notifications
You must be signed in to change notification settings - Fork 114
Closed
Description
The following filtering appears to truncate data when more than two observations are not available for each object.
def _file_dataset_fn(data_path):
dataset = (
tf.data.Dataset.list_files(data_path).shuffle(
files_buffer_size).interleave(
input_dataset, cycle_length=num_readers, block_length=1)
# Due to a bug in collection, we sometimes get empty rows.
.filter(lambda string: tf.strings.length(string) > 0).apply(
tf.data.experimental.shuffle_and_repeat(shuffle_buffer_size)).map(
parser_fn, num_parallel_calls=num_map_threads)
# Only keep sequences of length 2 or more.
.filter(lambda traj: tf.size(traj.reward) > 2)) <- HERE
In our experiments, in cases where one or two optimizations are performed on each object, all data may be filtered, causing hangs in subsequent processing. Do you know why this filter is set? Is it OK to set the filter condition arbitrary?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels