updated the util code to buffer mode write #261

bhanu77prakash · 2022-04-13T22:07:13Z

As mentioned in the latest issue, I have tried to convert the write mode from line by line to a buffer mode write. I tried running the updated script on the toy dataset and the outputs are exactly same.

AdrianKs · 2022-04-14T07:52:03Z

data/preprocess/util.py

+def write_triple(f, buffer, ent, rel, t, S, P, O):
    """Write a triple to a file. """
-    f.write(str(ent[t[S]]) + "\t" + str(rel[t[P]]) + "\t" + str(ent[t[O]]) + "\n")
+    buffer += f"{str(ent[t[S]])}\t{str(rel[t[P]])}\t{str(ent[t[O]])}\n"


I think it is faster, to make the buffer a list of strings and finally join this list to a single string

buffer_list = [] buffer_list.append(new_triple_string) buffer = "" buffer.join(buffer_list)

AdrianKs · 2022-04-14T07:52:17Z

data/preprocess/util.py

-    f.write(str(ent[t[S]]) + "\t" + str(rel[t[P]]) + "\t" + str(ent[t[O]]) + "\n")
+    buffer += f"{str(ent[t[S]])}\t{str(rel[t[P]])}\t{str(ent[t[O]])}\n"
+    if len(buffer) > BUFFER_SIZE:
+        dump_buffer_to_file(buffer, f)


did you make sure, that the rest of the buffer is written to file at every necessary position? you did so in the RawSplit but process_triple is called also in other splits

AdrianKs · 2022-04-14T07:53:15Z

Thanks, this is a good idea. Some small changes are needed though.

updated the util code to buffer mode write

8c0a96d

AdrianKs reviewed Apr 14, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updated the util code to buffer mode write #261

updated the util code to buffer mode write #261

Uh oh!

bhanu77prakash commented Apr 13, 2022

Uh oh!

AdrianKs Apr 14, 2022

Uh oh!

AdrianKs Apr 14, 2022

Uh oh!

AdrianKs commented Apr 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

updated the util code to buffer mode write #261

Are you sure you want to change the base?

updated the util code to buffer mode write #261

Uh oh!

Conversation

bhanu77prakash commented Apr 13, 2022

Uh oh!

AdrianKs Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

AdrianKs Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

AdrianKs commented Apr 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants