Detection Phase

Hello,

I already generated watermarked data with the below code sample:

```
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    SynthIDTextWatermarkingConfig,
)

# Standard model and tokenizer initialization
tokenizer = AutoTokenizer.from_pretrained('repo/id')
model = AutoModelForCausalLM.from_pretrained('repo/id')

# SynthID Text configuration
watermarking_config = SynthIDTextWatermarkingConfig(
    keys=[654, 400, 836, 123, 340, 443, 597, 160, 57, ...],
    ngram_len=5,
)

# Generation with watermarking
tokenized_prompts = tokenizer(["your prompts here"])
output_sequences = model.generate(
    **tokenized_prompts,
    watermarking_config=watermarking_config,
    do_sample=True,
)
watermarked_text = tokenizer.batch_decode(output_sequences)
```
Could you help me with:
How to train the detector and how to detect the watermark?
My length of output text is maximum 200 tokens, can you suggest the threshold for detection?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detection Phase #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detection Phase #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions