Skip to content

scanf.py: allow empty strings for %s format option#197

Merged
olebole merged 11 commits intomainfrom
issue196
Jan 26, 2026
Merged

scanf.py: allow empty strings for %s format option#197
olebole merged 11 commits intomainfrom
issue196

Conversation

@olebole
Copy link
Member

@olebole olebole commented Jan 14, 2026

This re-establishes

>>> scanf("%s", "")
("", )

Difference is however now

>>> scanf("%d%s", "1")
(1, "")  # original IRAF and PyRAF would return (1,)

Closes: #196

@olebole
Copy link
Member Author

olebole commented Jan 15, 2026

@jehturner

I think scanf_compile() probably just needs to return a list of compiled expressions, rather than a single, combined expression.

I implemented this in a second commit (and rebased after the now-merged #195). Can you have a look/test please?

@jehturner
Copy link
Collaborator

Thank you! I was looking into this, but will check what you've done.

This largely restructures the code. Instead of compiling everything
into one all-or-nothing regular expression it keeps all regular
expressions for the format strings separately. To scan input, it
iterates over these regexps until they no longer match.
@olebole
Copy link
Member Author

olebole commented Jan 15, 2026

With the new code, I get

>>> fmt = "%s %s"
>>> scanf(fmt, "")
('', '')
>>> scanf(fmt, "x")
('x', '')
>>> scanf(fmt, "x ")
('x', '')
>>> scanf(fmt, "x y")
('x', 'y')

resp. in PyRAF

--> testspace ""
2 [][]
--> testspace "x"
2 [x][]
--> testspace "x "
2 [x][]
--> testspace "x y"
2 [x][y]

The i.e. the conversion is the same as in IRAF/ecl, but the return code (number of conversions) differs. This is something glued into PyRAF, because the API of the PyRAF scanf function (both C module and new Python code) returns the tuple of values that matched the format, and the calculation of the return code is done elsewhere.
One could probably from the right remove all empty string except (or including) the leftmost one to resemble the original PyRAF behavior, but I am not sure whether that is correct: The space shall always convert into zero or more whitespaces; and if we agree that an empty string is a correct conversion of %s for an empty string than this is the correct result.

@olebole
Copy link
Member Author

olebole commented Jan 15, 2026

OK, I added another commit that modifies the stop condition. Now it resembles the original PyRAF behavior:

--> testspace ""
1 [][-]
--> testspace "x"
1 [x][-]
--> testspace "x "
1 [x][-]
--> testspace "x y"
2 [x][y]

@jehturner
Copy link
Collaborator

jehturner commented Jan 15, 2026

Yes, so there are long-established usage patterns that rely on IRAF's return value behaviour (eg. "%s %s" to scan a single string value and also check for unwanted spaces with n > 1). Maybe that's messy, but existing stuff would break with your penultimate example, so I'm happy to see that last one 😅.

@jehturner
Copy link
Collaborator

I think this looks pretty good. I should probably try adding some more test cases to be sure!

@jehturner
Copy link
Collaborator

I'm not sure whether this would break our stuff, but I notice some odd treatment of spaces in IRAF. With CL & PyRAF 2.2.1,

fscanf ("x y", "%s%s", outstr1, outstr2)

gives outstr1="x", outstr2="y", whereas this branch gives outstr1="x", outstr2="" (with n=2 in all cases).

@olebole
Copy link
Member Author

olebole commented Jan 16, 2026

That means that any conversion (maybe except %c shall include leading whitespaces. I could add them directly to the regexps.

@jehturner
Copy link
Collaborator

The many possible corner cases make me a bit nervous, but this seems to be working quite well. I'll try re-running our tests with it early next week.

The output handling was changed by accident when the Python scanf code was introduced
This resembles the behavior of the original sscanf C module
@olebole
Copy link
Member Author

olebole commented Jan 17, 2026

I used my Copilot time to discuss a number of test cases and found the major cause for the problem we had: I accidentally changed the handling of the scanf module output in iraffunctions.py. Specifically for zero matches, it hardcodes one empty string as return value.

This behavior is now re-established, and the regular expressions are now adjusted to behave scanf.py in the same manner as the original sscanfmodule.c. I also kept your test cases to be safe.

There are still a few differences between the C module and scanf.py (and IRAF's scanf implementation):

Format Input IRAF sscanfmodule.c scanf.py
%x 0x27 not parsed 0x27 not parsed
%d 0x27 not parsed 0x27 not parsed
%d 077 77 0o77 77
%f 12:30:00 12.5 not parsed not parsed
%i,%l, %u, %E, %X not available std. C behavior not available
INDEF indef not parsed not parsed

@jehturner
Copy link
Collaborator

It's great to see all those tests! The only thing that might worry me in that table is INDEF, but if it wasn't working before then it shouldn't break anything. I believe %x is supposed to parse IRAF's 27x, not 0x27.

@jehturner
Copy link
Collaborator

I notice some inexact float values like 3.14& 1.23e-4 in the tests without an "approximately equals" comparison, but I think those should be fine, since scanf is only doing float() on the same machine that the literals are parsed on, which is the same thing (I had stuck to powers of 2, just to be safe)?

@olebole
Copy link
Member Author

olebole commented Jan 17, 2026

For %x in the table: I meant fscanf("0x27", "%x", out) doesn't work as naively expected - this would only parse the leading zero.
For the floats, you could be right; I will adjust the non-integer parts to be powers of 0.5.

@olebole olebole force-pushed the issue196 branch 2 times, most recently from 78c3543 to 9c4d85d Compare January 17, 2026 15:08
olebole and others added 2 commits January 18, 2026 13:10
The test cases were generated in discussion with ChatGPT5o to examine
the differences in the behavior of scanf.py and the earlier
ssanfmodule.c. The fixes in scanf.py were then done to let all tests pass.
@jehturner
Copy link
Collaborator

This is looking much better in my Gemini IRAF testing so far. I'm just running some extra tests...

@jehturner
Copy link
Collaborator

Hi @olebole, this passed all of our non-interactive testing (in some quick interactive testing, I ran into another display issue, but that's unrelated), so I think it's OK. Thanks for all your effort on this!

@olebole
Copy link
Member Author

olebole commented Jan 26, 2026

Hi @jehturner that is great; I will merge!

@olebole olebole merged commit 9607ad7 into main Jan 26, 2026
5 checks passed
@olebole olebole deleted the issue196 branch January 26, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Another compatibility bug with the new scanf, when parsing blank strings

2 participants