-
Notifications
You must be signed in to change notification settings - Fork 187
[alpha] move from twisted to asyncio + new network #930
Conversation
- add hack to work around max 1 request per hash session limit causing out of sync - more logic to shutdown cleanly (+9 squashed commits) Squashed commits: [59c5916] - move mempool out of nodemanager - add relay cache + logic - add getdata support in node for relay cache items only [1d081e9] add helper logic to avoid trying to persist blocks multiple times and keeping wallet height in check [c20cd6e] fix node.relay() , remove property for size in message [383ad47] - Add new ping payload support - resolve send_addr issue - cleanup unused functions, add missing typing [8bed164] clean up prints to use logger, remove obsolete code [9791737] Update db schema, ensure setting up an asyncio loop before initialising a DB, don't manually stop loop after run complete [1499f91] - update requirements - fix /examples/ - fix `set maxpeers` not adding new CLI clients - fix exceeding max nodes connection count - add IPfiltering (blacklist/whitelist) - fix show nodes extra flags not working with 0 connected nodes - add code for more graceful shutdown preventing stack traces and db corruption - update trouble shooting section after OSX update when `scrypt` module fails. [2743930] cleanup [ee2f4c6] New network code + move from twisted to asyncio cleanup - update requirements - fix /examples/ - fix `set maxpeers` not adding new CLI clients - fix exceeding max nodes connection count - add IPfiltering (blacklist/whitelist) - fix show nodes extra flags not working with 0 connected nodes - add code for more graceful shutdown preventing stack traces and db corruption - update trouble shooting section after OSX update when `scrypt` module fails. Update db schema, ensure setting up an asyncio loop before initialising a DB, don't manually stop loop after run complete clean up prints to use logger, remove obsolete code - Add new ping payload support - resolve send_addr issue - cleanup unused functions, add missing typing fix node.relay() , remove property for size in message add helper logic to avoid trying to persist blocks multiple times and keeping wallet height in check - move mempool out of nodemanager - add relay cache + logic - add getdata support in node for relay cache items only - add hack to work around max 1 request per hash session limit causing out of sync - more logic to shutdown cleanly
- cleanup logger stuff - ensure we can't persist blocks that are out of order - remove tcp session hack
|
Awesome! :) I will absolutely start reviewing/testing this. As you know, this will take me some time. |
|
NOTE: I am going to enter check boxes next to my recommendations in case you would like to use them as a reference. A checkbox in my mind means the comment/recommendations has been addressed either with a fix or some other justification.
|
|
Running testnet api_server:
|
|
Whenever I query the server, the response is immediate even while syncing! 💪 However, for the Additionally, I have edited
|
|
When shutting down the server, I see
|
|
It seems like after awhile the server loses sync and isn't able to regain it.
|
|
Thanks, really good input!
not or not yet successfully done
|
My edit is very basic because I like to see exactly what the requests are, so I put this line at the beginning of What I am asking for is to reintroduce the code that produces lines like for each request. I can't seem to find |
|
ah I see. The |
Your update did not resolve the issue; however, it looks like a different unhandled exception now: I was able to get it to work with by suppressing every and in On a related note, did you happen to see this for graceful shutdowns? Not sure if it helps. I thought the |
imo the useful information inherent with an IP address overshadows its size. I am still in favor of just using the IP address. Also, it appears that your rename update did not take everywhere: |
I don't think I was originally seeing all the debug statements and my original review of the code was only cursory (also, it is a lot of changes). I can see now this loop, which is interesting: I think that |
|
The asyncio_server is ~an order of magnitude more efficient than the current server (comparing CPU usage)! 💪 |
jseagrave21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few items I found
| except SystemExit: | ||
| logger.info("Shutting down...") | ||
| site = main_task.result() | ||
| loop.run_until_complete(site.stop()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result in
AttributeError: 'NoneType' object has no attribute 'stop'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reproduce this? If so what are the steps? It doesn't seem to happen here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I realized my mistake. I actually added this onto the code for np-api-server so that is probably why it didn't work. Sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here fa0cfb9#diff-8aa8dec2a1f7f08ba100b97063d796eaR33
you meant with suppress((asyncio.CancelledError, Exception)):
|
@ixje please see the latest commit in https://github.com/ixje/neo-python/pull/5 which fixes the shutdown issue. Please note, I didn't push to a separate PR because the shutdown improvements also affected |
jseagrave21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these notifications are better served as debug statements because they aren't cuing any action from the user.
| self.connection_lost(ConnectionError) | ||
| self.disconnect() | ||
| except asyncio.CancelledError: | ||
| print("task cancelled, closing connection") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be debug statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding these print statements (and those below) . Yes, eventually they shouldn't be there at all. I think even including them with logger.debug might generate too much noise. I'll removing them as a TODO but want to keep them for now because I think they're useful at this stage
| self.connection_lost(ConnectionResetError) | ||
| self.disconnect() | ||
| except ConnectionError: | ||
| print("connection error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
| self._stream_writer.write(message.to_array()) | ||
| await self._stream_writer.drain() | ||
| except ConnectionResetError: | ||
| print("connection reset") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
| self.disconnect() | ||
| except Exception as e: | ||
| self.connection_lost() | ||
| print(f"***** woah what happened here?! {str(e)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
|
I got this deprecation warning when testing on a different computer and downloading the test fixture:
|
jseagrave21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments from my second pass through, now having a slightly better understanding of asyncio.
| passwd = prompt("[password]> ", is_password=True) | ||
| except KeyboardInterrupt: | ||
| print("Wallet opening cancelled") | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, added it back. I was probably trying out shutdown approaches and this might have been a test that failed with one of the approaches
|
|
||
| relayed = NodeLeader.Instance().Relay(contract_tx) | ||
| nodemgr = NodeManager() | ||
| # this blocks, consider moving this wallet function to async instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this instance of relaying somehow different than the others? I am just wondering about the comment and this is the first time I've seen it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After finishing reading through the changes, I saw one more similar comment as a "TODO". Is the relay func something that should be improved before release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has todo with slowly migrating to asyncio. relay in
| relayed = nodemgr.relay(contract_tx) |
should ideally be non-blocking. But because we're calling it from a function that is not defined as
async def we cannot turn def relay() into async def relay and await it. Therefore relay looks like
neo-python/neo/Network/neonetwork/network/nodemanager.py
Lines 282 to 289 in fa0cfb9
| def relay(self, inventory) -> bool: | |
| if type(inventory) is OrigTransaction or issubclass(type(inventory), OrigTransaction): | |
| success = self.mempool.add_transaction(inventory) | |
| if not success: | |
| return False | |
| # TODO: should we keep the tx in the mempool if relaying failed? There is currently no mechanism that retries sending failed tx's | |
| return wait_for(self.relay_directly(inventory)) |
where wait_for is a helper function that can call "awaitable" code from non async code.
It does not have to be improved before release. There is too much code still to port to async. That's why this wait_for exists to help in migrating over time. The advice is to only use asyncio when starting a new project, but there are cases like this where you'll have to slowly migrate. I got this wait_for trick from a guy at Instagram where they're also slowly migrating their code base to asyncio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Okay, sounds good.
| claim_tx, relayed = ClaimGas(wallet) | ||
| self.assertEqual(claim_tx, None) | ||
| self.assertFalse(relayed) | ||
| self.assertIn("Claim transaction cancelled", mock_print.getvalue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this test deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing same reason as the wallet cancellation. Added it back.
| self.assertEqual(leader.Peers[0].address, test_node.address) | ||
| self.assertTrue(mock_disconnect.called) | ||
| self.assertIn(f"Maxpeers set to {settings.CONNECTED_PEER_MAX}", mock_print.getvalue()) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that the maxpeers command was refactored but the logic added in #878 is still there so why is this test removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I noticed that there is now a minpeers attribute but no associated config minpeers or tests. Is this something I could/should add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This asyncio branch no longer has a NodeLeader class so this logic can't work and has been refactored (wait for new asyncio PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am still a little confused. Does this mean that you already have the new code but are just waiting to commit in a separate PR?
| nodemgr.reset_for_test() | ||
|
|
||
| def test_get_version(self): | ||
| # TODO: what's the nonce? on testnet live server response it's always 771199013 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we assert the nonce value in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
I finally caught up with the 20+ commits/PR's they did on the
Can you describe the steps that produce this error? I searched the code base for any importing from |
|
Okay, sounds good.
I received this warning twice while testing on a new computer which didn't have the test fixtures downloaded. The warning was received each time a test fixture was downloaded. I am not sure if that is related to the warning but that was when it occured. -edit- @ixje |
|
closing in favour of #934 |
|
400 points seagrave He has been doing a lot extensive testing and reviewing for this PR. On top of that he has been creating PR's directly to my 'private repo' to address certain issues. This is an ongoing effort for one of the biggest changes to The process will be continued in #934 |
This PR ripped the 3rd party event-driven framework
twistedout ofneo-pythonand replaced it with the standard libraryasyncio. It also introduces new network code.neo-pythondoesn't have 100% test coverage and this PR touches pretty much every component, therefore I'm looking for help with general testing of this branch. This can be anything from running the RPC server, to general interactions like opening/closing/rebuilding wallets, sending assets, searching contracts. Whatever feature is in there should have no regression and function as expected. If it shows unexpected behaviour (read the note below first) then please drop a comment here, also for any errors you think shouldn't be there/printed .Important notes
np-bootstrapas suggested will not work as the bootstrap files have not yet been updated.neo-cli->v2.10.1. Therefore when using this in a private-net environment make sure you run it against nodes withneo-cliv2.10.1. Upgradeneo-privatenetusing:neo-localis not yet updated afaik.To elaborate why this is important: the
>= 2.9.x <= 2.10.0series ofneo-clistarted tracking data sent to a peer and limit requesting the same data to only once per session. In order to keep up to date with the best height of a connected node in those series we have 2 optionsI found neither approach acceptable and convinced NEO to add a new network command that allows requesting the best height of a peer without disconnecting. Without going into too much detail, the new network code uses this information to avoid requesting data from a node that doesn't have the data in the first place. The impact for now is that neo-cli nodes <
2.10.1basically cannot stay in sync. It will be a transition period that we'll have to live with.