-
Notifications
You must be signed in to change notification settings - Fork 144
Open
Description
I ran into an issue where the same data can yield different hashes when using the API for incremental updates. It seems that using a too small first chunk causes the deviation:
import tlsh
data = b"Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod"
print(f"One-shot: {tlsh.hash(data)}")
h_long_inc = tlsh.Tlsh()
h_long_inc.update(data[:5])
h_long_inc.update(data[5:])
h_long_inc.final()
print(f"Long: {h_long_inc.hexdigest()}")
h_short_inc = tlsh.Tlsh()
h_short_inc.update(data[:4])
h_short_inc.update(data[4:])
h_short_inc.final()
print(f"Short: {h_short_inc.hexdigest()}")The above yields the following result:
One-shot: T19AA0120D0B41078406C204393AA94058A6082010E26C68420CB6B028112200C8020555
Long: T19AA0120D0B41078406C204393AA94058A6082010E26C68420CB6B028112200C8020555
Short: T141A0121D0B41054402C604393AA94058A2082010E36C58410CB5B024112100C8020559
I noticed this rerunning a test that was baselined ~8 years ago. So this seems to be a regression. For said test, the hash differed only in a single character.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels