added test of else method using differently ordered columns#682
added test of else method using differently ordered columns#682
Conversation
a06147b to
6d235f6
Compare
| @@ -0,0 +1,4 @@ | |||
| {"_time":"1970-01-01T00:00:00.000000001","_subsort":0,"_key_hash":17095134351192101601,"_key":"dev","joined":{"text":"Thread 1","user":"UCZ4","time":1,"thread_ts":1.0,"key":"dev"},"threads":{"text":"Thread 1","user":"UCZ4","time":1.0,"thread_ts":1.0,"key":"dev"},"non_threads":null} | |||
There was a problem hiding this comment.
Can you elaborate on what is the expected results here? These appear to be correct. If threads is null then non_threads is used as the value of joined.
There was a problem hiding this comment.
the expected results are correct. the test currently fails. https://github.com/kaskada-ai/kaskada/actions/runs/5926509057/job/16068060314?pr=682#step:6:211
| def record_source_slack() -> kd.sources.JsonlString: | ||
| content = "\n".join( | ||
| [ | ||
| """{"text":"Thread 1","user":"UCZ4","time":1,"thread_ts":1,"key":"dev"}""", |
There was a problem hiding this comment.
Shouldn't use """...""" here -- those are "docstrings". Lint will likely suggest changing it.
| """{"text":"Msg 2","user":"U016","time":4,"thread_ts":null,"key":"dev"}""", | ||
| ] | ||
| ) | ||
| return kd.sources.JsonlString( |
There was a problem hiding this comment.
PyList may be even easier for this example:
source = kd.sources.PyList(
[
{"time": "1996-12-19T16:39:57", "text": "Thread 2", "user": "U016", "thread_ts": 1, "key": "dev" },
...,
],
time_column_name="time",
key_column_name="user",
)
| threads = record_source_slack.filter(record_source_slack.col("thread_ts").is_not_null()) | ||
| non_threads = record_source_slack.filter(record_source_slack.col("thread_ts").is_null()) | ||
|
|
||
| # this call re-orders the columns in the non_threads timestream |
There was a problem hiding this comment.
Looking at the error, it doesn't seem to care that the fields are re-ordered. Instead, it doesn't like row 3:
At positional index 2, first diff: {'user': None, 'text': None, 'time': None, 'thread_ts': None, 'key': None} != {'text': 'Msg 1', 'user': 'U016', 'time': 3, 'thread_ts': None, 'key': 'dev'}
| non_threads = joined.col("non_threads") | ||
|
|
||
| golden.jsonl( | ||
| joined.extend({"joined": threads.else_(non_threads)}) |
There was a problem hiding this comment.
Ah. I believe the problem here is that we're creating a new record containing joined. Even when threads.else_(non_threads) is null, this record will sometimes be not null. Eg.
Two questions / thoughts:
- Why do we need the
joinedat all? What happens if you just outputthreads.else_(non_threads) - If we could do
joined.extend(threads.else_(non_threads)that would also help (and it is currently planned)
6d235f6 to
493a7a7
Compare
i was having trouble shortening this test, while still being able to reproduce the bug.
feel free to shorten if it still reproduces the issue.
note that I haven't fixed the issue yet. only added this failing test.