scanf.py: allow empty strings for %s format option by olebole · Pull Request #197 · iraf-community/pyraf

olebole · 2026-01-14T06:22:03Z

This re-establishes

>>> scanf("%s", "")
("", )

Difference is however now

>>> scanf("%d%s", "1")
(1, "")  # original IRAF and PyRAF would return (1,)

Closes: #196

olebole · 2026-01-15T18:54:44Z

@jehturner

I think scanf_compile() probably just needs to return a list of compiled expressions, rather than a single, combined expression.

I implemented this in a second commit (and rebased after the now-merged #195). Can you have a look/test please?

jehturner · 2026-01-15T19:06:40Z

Thank you! I was looking into this, but will check what you've done.

This largely restructures the code. Instead of compiling everything into one all-or-nothing regular expression it keeps all regular expressions for the format strings separately. To scan input, it iterates over these regexps until they no longer match.

olebole · 2026-01-15T20:26:11Z

With the new code, I get

>>> fmt = "%s %s"
>>> scanf(fmt, "")
('', '')
>>> scanf(fmt, "x")
('x', '')
>>> scanf(fmt, "x ")
('x', '')
>>> scanf(fmt, "x y")
('x', 'y')

resp. in PyRAF

--> testspace ""
2 [][]
--> testspace "x"
2 [x][]
--> testspace "x "
2 [x][]
--> testspace "x y"
2 [x][y]

The i.e. the conversion is the same as in IRAF/ecl, but the return code (number of conversions) differs. This is something glued into PyRAF, because the API of the PyRAF scanf function (both C module and new Python code) returns the tuple of values that matched the format, and the calculation of the return code is done elsewhere.
One could probably from the right remove all empty string except (or including) the leftmost one to resemble the original PyRAF behavior, but I am not sure whether that is correct: The space shall always convert into zero or more whitespaces; and if we agree that an empty string is a correct conversion of %s for an empty string than this is the correct result.

olebole · 2026-01-15T20:37:20Z

OK, I added another commit that modifies the stop condition. Now it resembles the original PyRAF behavior:

--> testspace ""
1 [][-]
--> testspace "x"
1 [x][-]
--> testspace "x "
1 [x][-]
--> testspace "x y"
2 [x][y]

…odule

jehturner · 2026-01-15T20:48:13Z

Yes, so there are long-established usage patterns that rely on IRAF's return value behaviour (eg. "%s %s" to scan a single string value and also check for unwanted spaces with n > 1). Maybe that's messy, but existing stuff would break with your penultimate example, so I'm happy to see that last one 😅.

jehturner · 2026-01-15T22:58:14Z

I think this looks pretty good. I should probably try adding some more test cases to be sure!

jehturner · 2026-01-16T01:49:09Z

I'm not sure whether this would break our stuff, but I notice some odd treatment of spaces in IRAF. With CL & PyRAF 2.2.1,

fscanf ("x y", "%s%s", outstr1, outstr2)

gives outstr1="x", outstr2="y", whereas this branch gives outstr1="x", outstr2="" (with n=2 in all cases).

olebole · 2026-01-16T13:15:32Z

That means that any conversion (maybe except %c shall include leading whitespaces. I could add them directly to the regexps.

jehturner · 2026-01-17T01:58:35Z

The many possible corner cases make me a bit nervous, but this seems to be working quite well. I'll try re-running our tests with it early next week.

The output handling was changed by accident when the Python scanf code was introduced

This resembles the behavior of the original sscanf C module

olebole · 2026-01-17T12:43:34Z

I used my Copilot time to discuss a number of test cases and found the major cause for the problem we had: I accidentally changed the handling of the scanf module output in iraffunctions.py. Specifically for zero matches, it hardcodes one empty string as return value.

This behavior is now re-established, and the regular expressions are now adjusted to behave scanf.py in the same manner as the original sscanfmodule.c. I also kept your test cases to be safe.

There are still a few differences between the C module and scanf.py (and IRAF's scanf implementation):

Format	Input	IRAF	`sscanfmodule.c`	`scanf.py`
`%x`	`0x27`	not parsed	0x27	not parsed
`%d`	`0x27`	not parsed	0x27	not parsed
`%d`	`077`	77	0o77	77
`%f`	`12:30:00`	12.5	not parsed	not parsed
`%i`,`%l`, `%u`, `%E`, `%X`		not available	std. C behavior	not available
	`INDEF`	indef	not parsed	not parsed

jehturner · 2026-01-17T14:25:07Z

It's great to see all those tests! The only thing that might worry me in that table is INDEF, but if it wasn't working before then it shouldn't break anything. I believe %x is supposed to parse IRAF's 27x, not 0x27.

jehturner · 2026-01-17T14:35:44Z

I notice some inexact float values like 3.14& 1.23e-4 in the tests without an "approximately equals" comparison, but I think those should be fine, since scanf is only doing float() on the same machine that the literals are parsed on, which is the same thing (I had stuck to powers of 2, just to be safe)?

olebole · 2026-01-17T14:38:45Z

For %x in the table: I meant fscanf("0x27", "%x", out) doesn't work as naively expected - this would only parse the leading zero.
For the floats, you could be right; I will adjust the non-integer parts to be powers of 0.5.

The test cases were generated in discussion with ChatGPT5o to examine the differences in the behavior of scanf.py and the earlier ssanfmodule.c. The fixes in scanf.py were then done to let all tests pass.

…works

jehturner · 2026-01-19T22:49:35Z

This is looking much better in my Gemini IRAF testing so far. I'm just running some extra tests...

jehturner · 2026-01-26T18:27:23Z

Hi @olebole, this passed all of our non-interactive testing (in some quick interactive testing, I ran into another display issue, but that's unrelated), so I think it's OK. Thanks for all your effort on this!

olebole · 2026-01-26T18:29:38Z

Hi @jehturner that is great; I will merge!

scanf.py: allow empty strings for %s format option

4611b31

olebole force-pushed the issue196 branch from 6c069e0 to 75c1b9b Compare January 15, 2026 18:51

olebole force-pushed the issue196 branch from 75c1b9b to c5c9c52 Compare January 15, 2026 20:08

scanf.py: update stop condition for scan to resemble original scanf m…

5cd06d0

…odule

olebole force-pushed the issue196 branch from 7ae80be to 5cd06d0 Compare January 15, 2026 20:38

olebole added 2 commits January 16, 2026 14:36

Allow whitespaces before any expression (except %c)

78f550e

scanf: Handle skipped fields within the same format regular expression

e21faeb

olebole added 2 commits January 17, 2026 13:00

Revert to original handling of sscanf output

5770cbd

The output handling was changed by accident when the Python scanf code was introduced

scanf module shall always return a list

159b396

This resembles the behavior of the original sscanf C module

olebole force-pushed the issue196 branch from b06518f to fe4310d Compare January 17, 2026 12:18

olebole force-pushed the issue196 branch from fe4310d to 5ef3ff9 Compare January 17, 2026 13:55

olebole force-pushed the issue196 branch 2 times, most recently from 78c3543 to 9c4d85d Compare January 17, 2026 15:08

olebole and others added 2 commits January 18, 2026 13:10

Add more test cases and fix scanf.py

6f597c5

The test cases were generated in discussion with ChatGPT5o to examine the differences in the behavior of scanf.py and the earlier ssanfmodule.c. The fixes in scanf.py were then done to let all tests pass.

Add a few more test cases, including patterns used in real scripts.

513aef7

olebole force-pushed the issue196 branch from 4e0519f to d27d6e0 Compare January 18, 2026 12:10

olebole force-pushed the issue196 branch from d27d6e0 to fe30dd7 Compare January 18, 2026 12:13

olebole added 2 commits January 19, 2026 07:17

Handle constant elements of format strings as regular expressions

fa41774

Remove unnecessary parts of the code; add some comments about how it …

f7de070

…works

olebole force-pushed the issue196 branch from fe30dd7 to f7de070 Compare January 19, 2026 06:17

olebole merged commit 9607ad7 into main Jan 26, 2026
5 checks passed

olebole deleted the issue196 branch January 26, 2026 18:30

Conversation

olebole commented Jan 14, 2026

Uh oh!

olebole commented Jan 15, 2026

Uh oh!

jehturner commented Jan 15, 2026

Uh oh!

olebole commented Jan 15, 2026

Uh oh!

olebole commented Jan 15, 2026

Uh oh!

jehturner commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jehturner commented Jan 15, 2026

Uh oh!

jehturner commented Jan 16, 2026

Uh oh!

olebole commented Jan 16, 2026

Uh oh!

jehturner commented Jan 17, 2026

Uh oh!

olebole commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jehturner commented Jan 17, 2026

Uh oh!

jehturner commented Jan 17, 2026

Uh oh!

olebole commented Jan 17, 2026

Uh oh!

jehturner commented Jan 19, 2026

Uh oh!

jehturner commented Jan 26, 2026

Uh oh!

olebole commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

jehturner commented Jan 15, 2026 •

edited

Loading

olebole commented Jan 17, 2026 •

edited

Loading