Skip to content

libpython: add to/from numpy functions#423

Open
ninsbl wants to merge 6 commits intoOSGeo:mainfrom
ninsbl:numpy_pygrass
Open

libpython: add to/from numpy functions#423
ninsbl wants to merge 6 commits intoOSGeo:mainfrom
ninsbl:numpy_pygrass

Conversation

@ninsbl
Copy link
Member

@ninsbl ninsbl commented Mar 16, 2020

This PR would add two New functions:

  1. to parse e.g. stdout into a numpy array and
  2. to write a numpy array to a table in the DB backend

I would very much appreciate thorough review of the two functions, esp. the one writing to DB.
Also hints on how to properly add examples for doctest (or other unit-tests) would be very welcome...

@ninsbl ninsbl added the enhancement New feature or request label Mar 16, 2020
@ninsbl ninsbl requested review from wenzeslaus and zarch March 16, 2020 23:00
@ninsbl
Copy link
Member Author

ninsbl commented Mar 18, 2020

@metzm
Copy link
Contributor

metzm commented Mar 29, 2020

In this PR you are using the python interface to SQLite3 and PG. Have you tested the alternative using db.execute? The reason I am asking is that the GRASS db drivers have a lot of error handling that might be missing from the python interfaces to the respective DB drivers. Using the python interfaces could case cryptic errors if something goes wrong that might be better explained by the GRASS db drivers.

@wenzeslaus
Copy link
Member

Also hints on how to properly add examples for doctest (or other unit-tests) would be very welcome...

Any general Python testing instructions should be applicable like this or this. You should probably focus just on SQLite, because PostgreSQL is more difficult to set up for the tests (although it is possible).

If working within NC SPM location won't work for you tests, you can try to write plain Python unittest test and use e.g. --exec to run GRASS (the testing framework is from the times before --exec, so it does not integrate with it well, but I already had to use this approach here).

@ninsbl
Copy link
Member Author

ninsbl commented May 11, 2020

In this PR you are using the python interface to SQLite3 and PG. Have you tested the alternative using db.execute?

I have not tested. The python interfaces offer some additional functionality and direct translation between Python objects and database data types. DB handling in pygrass (where these functions are supposed to end up) also uses Python DB adapters. But It is a good point to double check that potential errors are caught properly!

"""
sql_to_dtype = {
"sqlite": {
"INTEGER": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does these really need to be numbers and not things like numpy.int32?

Comment on lines 801 to 804
insert_sql = "INSERT INTO {}({}) VALUES %s;".format(
table,
", ".join(structured_array.dtype.names),
",".join(["?"] * len(structured_array.dtype.names)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use named placeholders such as {table} and table= when there is more than one item to avoid confusion. Here seems to be some with %s and the third argument.

@neteler neteler added the Python Related code is in Python label Dec 9, 2021
@neteler neteler added this to the 8.0.1 milestone Dec 9, 2021
@ninsbl ninsbl modified the milestones: 8.0.1, 8.2.0 Feb 20, 2022
@wenzeslaus wenzeslaus modified the milestones: 8.2.0, 8.4.0 Mar 16, 2022
@wenzeslaus wenzeslaus modified the milestones: 8.3.0, 8.4.0 Feb 10, 2023
@neteler neteler changed the title add to/from numpy functions libpython: add to/from numpy functions Nov 7, 2023
@neteler
Copy link
Member

neteler commented Nov 7, 2023

@ninsbl: would you mind to rebase this PR?

@wenzeslaus wenzeslaus modified the milestones: 8.4.0, 8.5.0 Apr 26, 2024
@echoix echoix added the conflicts/needs rebase Rebase to or merge with the latest base branch is needed label Jul 3, 2024
@github-actions github-actions bot added GUI wxGUI related docker Docker related CI Continuous integration raster Related to raster data processing temporal Related to temporal data processing C Related code is in C labels Aug 30, 2024
@echoix echoix removed temporal Related to temporal data processing Python Related code is in Python C Related code is in C C++ Related code is in C++ HTML Related code is in HTML CSS Related code is in CSS database Related to database management RFC Request For Comment (RFC) document libraries module general display imagery tests Related to Test Suite raster3d notebook labels Oct 31, 2024
@echoix echoix added the conflicts/needs rebase Rebase to or merge with the latest base branch is needed label Feb 26, 2025
@github-actions github-actions bot added Python Related code is in Python libraries labels Aug 10, 2025
@echoix echoix removed the conflicts/needs rebase Rebase to or merge with the latest base branch is needed label Aug 10, 2025
@echoix
Copy link
Member

echoix commented Aug 10, 2025

Fixed the bad merge by undoing that commit, doing a reset, then merging upstream/main again. Now there isn't 5000+ files changed with 2000+ commits, but really the only single one changed with like 3 commits.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
ninsbl and others added 2 commits August 10, 2025 22:48
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Member

@wenzeslaus wenzeslaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ninsbl Can you please review this older PR to see how/if your motivation for this changed over time?

The two functions are related based on the original motivation, but independent as they stand now, also have different level of complexity, so splitting the PR into two would help merging at least parts of this faster.

Comment on lines +495 to +497
if type(tablestring).__name__ == "str":
tablestring = grasscore.encode(tablestring, encoding=encoding)
elif type(tablestring).__name__ != "bytes":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not testing the types or instances (isinstance) instead of names as strings?

if structured:
kwargs["dtype"] = None

return np.genfromtxt(BytesIO(tablestring), **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the new function is adding on top of np.genfromtxt, the encoding?

Comment on lines +439 to +449
def txt2numpy(
tablestring,
sep=",",
names=None,
null_value=None,
fill_value=None,
comments="#",
usecols=None,
encoding=None,
structured=True,
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this one to a separate PR. Seems much easier to get it merged, perhaps it could be integrated into grass.tools if really useful. Or maybe improvements in format=json and format=csv when used together with np.genfromtxt are a better route?

Comment on lines +515 to +524
def numpy2table(
np_array,
table,
connection,
formats=None,
names=False,
column_prefix="column",
update_formats=True,
overwrite=True,
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the motivation is the same as the referenced https://trac.osgeo.org/grass/ticket/3639, wouldn't a tool which takes a CSV table puts it into a database table be more fitting? Like db.in.org but specialized for CSV (db.in.table or db.in.csv).

@wenzeslaus wenzeslaus modified the milestones: 8.5.0, 8.6.0 Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libraries Python Related code is in Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants