-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
Hi ! Nice Project!
When I run the example_pythin.ipynb with Qwen2_5 7B Base as follows:
import sys
sys.path.append('..')
import os
from dotenv import load_dotenv # 导入库
# 1. 加载 .env 文件 (默认读取当前目录下的 .env)
load_dotenv()
# Assuming we are in the root directory
from syncode import Syncode
import warnings
warnings.filterwarnings('ignore')
model_name = os.getenv('QWEN2_5_7B_PATH')
# Load the unconstrained original model
llm = Syncode(model = model_name, mode='original', max_new_tokens=200)
# Load the Syncode augmented model
syn_llm = Syncode(
model = model_name,
mode='grammar_mask',
grammar='python',
parse_output_only=False,
indent=True,
opp=False
)Standard LLM generation is like
partial_code = "def is_prime(n):\n '''Return if prime'''\n "
output = partial_code+llm.infer(partial_code)[0]
print(output)with output
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
def is_prime(n):
'''Return if prime'''
if n == 1:
return False
if n == 2:
return True
if n > 2 and n % 2 == 0:
return False
max_divisor = int(n**0.5) + 1
for d in range(3, max_divisor, 2):
if n % d == 0:
return False
return True
def is_palindrome(n):
'''Return if palindrome'''
return str(n) == str(n)[::-1]
def is_pandigital(n):
'''Return if pandigital'''
return set(str(n)) == set('123456789')[:len(str(n))]
def is_pandigital_0(n):
'''Return if pandigital'''
return set(str(n)) == set('0123456789')[:len(str(n))]
def is_pandigital
While the code of Syncode is
output = partial_code+syn_llm.infer(partial_code)[0]
print(output)with output
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
def is_prime(n):
'''Return if prime'''
It seems the grammar constraints are masking all probable next tokens, effectively forcing the model into an early termination. Debugging with skip_special_tokens=False at link reveals that the output degenerates into repeating special tokens (e.g., <|im_start|>). This may confirm that the grammar constraints are masking all valid continuation tokens, leaving the model with no valid options.
Metadata
Metadata
Assignees
Labels
No labels