Releases: nv-legate/legate
v24.06.00
This release re-implements the Legate API in C++, which significantly reduces the overhead of the control code. This release also introduces the following major features:
- As a result of the C++ re-implementation of the API, now the entire Legate program can be written in C++ (previously the control code had to be written in Python).
- The Legate Array API, which extends Legate Stores with support for struct-type and nullable containers, and even containers of variable-length elements (e.g. string containers, and sparse array representations)
- An implementation of STL algorithms based on the Legate API, which allows users to easily express common parallelism patterns without needing to write custom tasks.
- Support for writing leaf tasks in Python (previously only leaf task implementations in C++ were supported)
- Integration with NSight Systems (initial support)
This release bumps the minimum support CUDA version to 12.0.
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.06/eula.pdf. x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/legate-core.
Documentation for this release can be found at https://docs.nvidia.com/legate/24.06/.
v23.11.00
This release focuses on bugfixes and documentation improvements, in particular a formally documented support matrix.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🛠️ Improvements
- Use repository variables as possible. by @mag1cp1n in #839
- Expand ranges when reading thread_siblings_list by @vzhurba01 in #849
- Use testdata to remove duplicate test dictionary by @vzhurba01 in #851
- Add a launcher option to the tester by @marcinz in #825
🐛 Bug Fixes
- Avoid gc infinite loop at runtime destruction time by @manopapad in #842
- Add missing 12.0 CUDA libraries to env generation script by @manopapad in #850
- Set Mypy version downloaded in CI by @Jacobfaib in #859
- Remove numpy from conda build dependencies. by @bdice in #855
- Control ucx presence in install_info more carefully by @bryevdv in #882
📖 Documentation
- Document support matrix by @manopapad in #852
- API reference for resource scoping by @magnatelee in #857
- Suggest using mamba over conda by @manopapad in #881
New Contributors
- @mag1cp1n made their first contribution in #839
- @vzhurba01 made their first contribution in #849
- @bdice made their first contribution in #855
- @trivialfis made their first contribution in #861
Full Changelog: v23.09.00...v23.11.00
v23.09.00
This release includes a number of bug fixes for multi-process execution, and quality-of-life improvements to the build system and driver script.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🛠️ Improvements
- Add support for custom wrappers by @bryevdv in #813
- Make NCCL warm-up optional by @magnatelee in #815
- Enable symbols on REALM_BACKTRACE through libdw by @manopapad in #742
- Clean up reduction store init using the new future map reduction API by @magnatelee in #821
- Use Legion with CMake's native CUDA language by @trxcllnt in #828
- Auto-detect multi-node based on env vars by @bryevdv in #832
📖 Documentation
🐛 Bug Fixes
- Pre-seed random number generators deterministically, to guard against control replication violations by @ipdemes in #809
- Enable shard-local future creation for IO by @ipdemes in #835
- Respect user-supplied PYTHONPATH by @bryevdv in #836
- Use unordered detach operations by @ipdemes in #823
- Fix oversubscription support in sharding functors by @ipdemes in #819
- Respect the type of passed storage in create_store by @manopapad in #834
New Contributors
- @ajschmidt8 made their first contribution in #826
Full Changelog: v23.07.00...v23.09.00
v23.07.00
This release introduces support for resource scoping annotations, which allow parts of the program to be assigned to a subset of the available processors/GPUs. This release also includes some more examples of writing legate libraries, improved logging and safety checks, and a refactoring of legate.core's internals.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🚀 New Features
- Add per-library loggers at the python level by @manopapad in #639
- Resource scoping by @magnatelee in #457
🛠️ Improvements
- Add support for Python 3.11 (#608) by @marcinz in #615
- Rename variables and functions to make them clearer by @magnatelee in #627
- Use subsumption checks for instance policies by @magnatelee in #626
- Add provenance information to nvtx ranges by @manopapad in #654
- Use parent frame indiscriminately in nested provenance by @manopapad in #666
- Safe vector accesses in examples by @magnatelee in #681
- Task variant registry by @magnatelee in #675
- Mapper refactoring by @magnatelee in #676
- adding flag for valgrind by @ipdemes in #686
- Core type system by @magnatelee in #697
- Revise CMAKE helper functions to support custom Python paths. by @csadorf in #702
- Error-out if multi-rank run is started on build w/o networking by @manopapad in #734
- Add specialized constructors and safety checks to
legate::Scalarby @magnatelee in #736 - Stop tracking callbacks by @magnatelee in #748
- Add --ranks-per-node option to tester by @bryevdv in #749
- Add support for test timeouts by @bryevdv in #756
- Add support for --gasnet-system by @bryevdv in #758
- Mapper unification by @magnatelee in #763
- Add simple --last-failed option by @bryevdv in #762
- Opt-out validation for C++ accessor types and dimension by @RAMitchell in #745
- More error checking for stores by @magnatelee in #784
- Use stable UIDs for common fixed-size array types by @magnatelee in #785
📖 Documentation
- Update info on using standard python interpreter by @manopapad in #628
- Disambiguate some flags in BUILD.md by @manopapad in #641
- Guard against attaching to non-contiguous buffers by @manopapad in #653
- Fix documentation issues by @marcinz in #655
- Note new minimum CUDA requirements for conda packages by @manopapad in #673
- Document read-only / env-only settings by @bryevdv in #684
- Document a case where the communicators list may be empty by @manopapad in #708
- Reduction example by @magnatelee in #660
- IO example by @magnatelee in #633
🐛 Bug Fixes
- Tutorial editable install fix by @jjwilke in #610
- Make lgpatch UX consistent with driver by @bryevdv in #617
- More robust nsys --sample flag with --nsys-extra by @jjwilke in #618
- Fix example build tests by @jjwilke in #646
- Don't use traceback.walk_stack(None) by @manopapad in #661
- Skip provenance from NVTX range if empty by @manopapad in #657
- Make
legate::is_floating_pointhold for float16 by @magnatelee in #692 - Fix the mapping of Futures in the BaseMapper by @manopapad in #671
- Add a missing include to cmake for legate helper functions by @marcinz in #693
- Fix CMake template directories to use current_dir for subfolders by @jjwilke in #688
- Not all task.futures are backing Stores by @manopapad in #700
- Fix off-by-one errors in resource scoping code by @manopapad in #714
- Fix a "file-not-found" bug during repeated editable installs by @manopapad in #716
- Minor fix for type construction in Scalar by @magnatelee in #719
- Make
tree_reducereuse the existing partition by @magnatelee in #699 - Fix bugs in corner cases of
tree_reduceby @magnatelee in #731 - Make sure local fields are not enabled for any Python interpreter by @magnatelee in #730
- Fixes for resource scoping by @magnatelee in #726
- Don't automatically close dlopen'ed .so's of Legate libs by @manopapad in #733
- Fix error w/ disable mpi setting by @bryevdv in #743
- Fix the broken unit test for machine objects by @magnatelee in #747
- site.getsitepackages() returns a list of paths, not a path by @ericniebler in #767
- avoid undefined behavior in
Span::endby @ericniebler in #772 - Set lib_dir explicitly to lib/, even on RHEL by @manopapad in #766
- Collective fix by @ipdemes in #687
- Constrain OpenBLAS version, to work around legion#1500 by @manopapad in #782
- avoid using nvtx domain separator @ in nvtx ranges by @jjwilke in #790
- Pin host compilers to 11.* during environment generation by @m3vaz in #791
New Contributors
- @csadorf made their first contribution in #702
- @ericniebler made their first contribution in #767
- @RAMitchell made their first contribution in #745
Full Changelog: v23.03.00...v23.07.00
v23.03.00
This is the beta release of Legate Core.
This release focuses on making it easier for developers to get started building libraries on top of Legate Core, including features like updated API documentation, helper CMake functions for bootstrapping new Legate library projects, and a new "Hello World" library example, that demos the use of fundamental Legate API calls.
This release also adds support for using the standard python interpreter for running Legate programs (in addition to using the custom legate driver script).
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🐛 Bug Fixes
- Mappers should skip collective views with no suitable instance by @magnatelee in #559
- don't use sys.argv for plain python init by @bryevdv in #569
- Add
--numamemto the tester by @magnatelee in #576 - Add nvml dependency in the conda build script to get the headers for realm by @m3vaz in #586
- Fix is_complete_for check by @manopapad in #587
- Fixes for running cuNumeric CI multi-node by @manopapad in #597
- Fix ucx:tls_host default value by @SeyedMir in #592
- Fix a bug in the new registration callback API by @magnatelee in #603
🚀 New Features
- Default python interpreter support for Legate by @eddy16112 in #539
- Build helper functions for legate projects, legate-hello example by @jjwilke in #571
🛠️ Improvements
- Update the architectures built in conda package by @marcinz in #545
- NVTX: Use RangePush and Domain by @evanramos-nvidia in #293
- Refactoring changes by @magnatelee in #581
- Fix C++ warnings, virtual destructor bugs, and style issues by @jjwilke in #591
- Add CTK stubs dir to implicit link directories by @trxcllnt in #599
- Pin Legion to specific commit sha by default by @trxcllnt in #593
- Add support for Python 3.11 by @m3vaz in #608
📖 Documentation
- Update Build.md to add the missing dependency, rust by @natsukium in #565
- Document DeferredBuffer.destroy() lifetime issues in CUDA tasks by @manopapad in #566
- API reference by @magnatelee in #563
- More informative OOM message by @manopapad in #604
New Contributors
- @evanramos-nvidia made their first contribution in #293
- @natsukium made their first contribution in #565
Full Changelog: v23.01.00...v23.03.00
v23.01.00
This release adds initial support for using the UCX Realm networking backend (for more efficient multi-node communication) and using Legion's new "collective views" feature (for improved scheduling of reduction operations). Both of these features are currently in preview mode, and not enabled by default. They are planned to become the default by next release, following further verification and tuning.
This release improves the build experience for developers, with fixes to corner cases in the cmake configuration, a rewrite of the build documentation, and a script for generating complete conda environments for development, covering all supported platforms.
This release also introduces improvements in user interface (improved jupyter support, more CLI options for debugging and profiling), memory usage (through better instance management in the mapper) and the Legate programming model (allowing libraries to add custom profiler annotations, and use arbitrary communicator libraries in their tasks).
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🐛 Bug Fixes
- Legion bug WAR: don't instantiate futures on framebuffer by @manopapad in #413
- Handle conflicts for library-level args by @bryevdv in #416
- Fix Transform class hierarchy by @manopapad in #427
- Handle scalar outputs correctly in manual tasks by @magnatelee in #432
- Explicitly build Legion if
legion_dirorlegion_src_diris not provided by @trxcllnt in #411 - Fix GPU shard computation by @bryevdv in #433
- Only set default CMake generator if Ninja is available: Issue #374 by @jjwilke in #379
- Fix an issue with editable installs by @bryevdv in #434
- Allow only one of --legion-dir and --legion-src-dir by @jjwilke in #387
- legate/util: fix a mypy error on MacOS by @rohany in #438
- Improvements to legate.jupyter by @bryevdv in #425
- Fix for cunumeric#668 by @manopapad in #453
- Only keep traceback reprs, to avoid cycles by @bryevdv in #447
- Fix returned legion paths for editable install with separate legion b… by @jjwilke in #442
- Make
install.pyreconfigure editable installs when build type changes by @trxcllnt in #455 - fix for -ll:networks none, we will init MPI if it has not been initialized by @eddy16112 in #465
- Construct region-backed 0D stores in a correct way by @magnatelee in #450
- Pass a sufficiently high default value for gasnet's ibv-max-hcas by @manopapad in #477
- Make overlap check tight by @manopapad in #479
- Conda env script fixes by @manopapad in #481
- Fix some typos by @manopapad in #485
- fix several reference cycle / leak related bugs by @rohany in #488
- legate/core: fix FutureMap leak in communicator shutdown by @rohany in #495
- src/core/mapping: adjust indirect copy mapping for GPUs by @rohany in #499
- Don't access stream pools unless we're on GPUs by @magnatelee in #503
- Update env gen script so OS type works for mac by @m3vaz in #523
- Don't check for collective behavior when we have WRITE privilege by @manopapad in #526
- All NCCL ranks on the same node must get the same NCCL_IB_HCA by @manopapad in #528
- legate/core/_legion: add default new argument to dep part functions by @rohany in #527
- Don't turn on Legate debug checks on debug-rel builds by @manopapad in #533
- src/core: guard against missing projection functors in collective check by @rohany in #534
- Erase cached reduction instances that cannot be acquired by @magnatelee in #536
🚀 New Features
- Support for library specific annotations by @magnatelee in #464
- Cycle detection check by @manopapad in #361
- Implementing logic for reuse of reduction instances by @ipdemes in #511
- Use collective views when mapping by @ipdemes in #466
🛠️ Improvements
- Driver verbose only for rank 0 or "none" launcher by @bryevdv in #403
- Consolidate driver and test driver codebases by @bryevdv in #397
- On mapping failure retry after tightening non-RO reqs by @manopapad in #423
- More changes for provenance by @magnatelee in #417
- Allow launcher_extra to split quoted values by @bryevdv in #444
- Add script to generate conda envs by @bryevdv in #367
- Mapper improvements by @magnatelee in #452
- Support for concurrent launches by @magnatelee in #459
- legate/core/types: add missing
to_pandas_typeon Complex types by @rohany in #467 - Add --cprofile driver option by @bryevdv in #475
- Optimize scalar extraction by @magnatelee in #472
- Refactor CPU collective communicator by @eddy16112 in #468
- Refactoring changes by @magnatelee in #478
- Regenerate
install_info.pyon every build by @trxcllnt in #486 - Update create_buffer to use socket memories whenever available by @magnatelee in #487
- Check for cycles involving Futures after runtime shutdown by @manopapad in #496
- Fix for 509 by @magnatelee in #510
- Improve build documentation by @manopapad in #517
- Pass
CMAKE_GENERATORto scikit-build by @trxcllnt in #529 - Change the default CPU architecture to haswell. by @marcinz in #538
- Build rust
legion_profby @trxcllnt in #535 - adding logic for collective instances to the legate_select_sources by @ipdemes in #532
- Add support for building Legion with the UCX backend by @SeyedMir in #516
New Contributors
Full Changelog: v22.10.00...v23.01.00
v22.10.00
Release 22.10 contains several improvements to memory management. Those changes are to recycle memory space from GC'ed Legate stores more eagerly for fresh ones. Another big change in this release is a new build infrastructure based on CMake and scikit-build for the Legate ecosystem, which is a big leap over the previous ad-hoc build system. The release also includes two useful debugging features: 1) provenance tracking for tasks and other operator kinds issued by client libraries and 2) detailed logging for client library mappers.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🐛 Bug Fixes
- Fix target_memory setting for task futures by @manopapad in #327
- avoiding division by 0 in slicing by @ipdemes in #319
- Use the correct key when adding types to a library TypeSystem by @manopapad in #333
- fix partitioning of empty regions by @ipdemes in #337
- Correctly inline map 0d stores by @magnatelee in #340
- Fix error and warning message when --launcher is missing by @manopapad in #341
- make sure communicators are destroyed one by one by @eddy16112 in #345
- Specify an upper bound for return value sizes for
core::extract_scalarby @magnatelee in #346 - Set an upper bound for allocations for the serdez type used in the core by @magnatelee in #348
- fix cpu communicator for omp by @eddy16112 in #352
- Use Legion primitives to coordinate accesses to the shared instance manager by @magnatelee in #355
- Synchronize instance manager accesses from mapper calls other than
map_taskby @magnatelee in #358 - Follow-on changes to #353 by @magnatelee in #360
- Scalar store fix by @magnatelee in #365
- InstanceManager segfault fixes by @manopapad in #368
- Fix typos in solver.py by @magnatelee in #366
- legate_core_cpp.cmake: add missing barrier header file in export by @rohany in #389
- Use Python GC to release Legion handles from destroyed RegionManagers by @magnatelee in #391
- legate/driver: fix driver legion_module path by @rohany in #394
- Legion bug WAR: don't instantiate futures on framebuffer by @manopapad in #409
- Revive dead region managers on field allocations by @magnatelee in #418
🚀 New Features
- Support for mapper logging by @magnatelee in #356
- Provenance tracking by @magnatelee in #370
- Add Fill operation by @manopapad in #369
- add jupyter config for legate by @eddy16112 in #309
🛠️ Improvements
- Make stores have an explicit bottom in their transform stacks by @magnatelee in #320
- Update conda env files to match cunumeric by @manopapad in #324
- An internal method to force initialize communicators by @magnatelee in #328
- Use empty buffers to create empty output stores by @magnatelee in #330
- Two improvements to error handling by @magnatelee in #336
- Skip conduit check when binding by @manopapad in #342
- Make numactl optional by @manopapad in #343
- Remove deprecated option --no-tensor by @manopapad in #344
- Silence shard registration warnings by @manopapad in #347
- Instance manager improvements by @magnatelee in #350
- A custom task wrapper for efficient handling of return values by @magnatelee in #353
- Refactoring to make the runtime object singleton by @magnatelee in #363
- Turn off the precise stacktrace capturing by default by @magnatelee in #362
- Add CMake build for C++ and scikit-build infrastructure for Python package installation by @jjwilke in #323
- Support building with GASNet-Ex and MPI backends by @manopapad in #384
- Better store management by @magnatelee in #364
- Modularize the legate driver by @bryevdv in #371
- Add a pool of region managers with LRU eviction by @magnatelee in #392
- Adjust consensus match frequency based on field sizes by @magnatelee in #402
- On mapping failure retry after tightening non-RO reqs by @manopapad in #424
📖 Documentation
New Contributors
Full Changelog: v22.08.00...v22.10.00
v22.08.00
Release 22.08.00 includes two major features: exception support and a communicator library for CPU and OpenMP tasks. The exception support captures any exception raised by a Legate task and re-raises it as a Python exception. The communicator library allows CPU and OpenMP tasks to perform explicit communication.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
New Features
- API to pop off the transformation stack by @magnatelee in #299
- API to query the key partition of store by @magnatelee in #296
- Exception support by @magnatelee in #258
- Support for scaling on partition variables by @magnatelee in #270
- Implement collective communicator for multi-node CPUs by @eddy16112 in #214
Improvements
- Add type annotations to helper functions for indirect copies by @magnatelee in #245
- Split create_task by @bryevdv in #269
- Minor env and mypy updates by @bryevdv in #273
- let gasnet to handle mpi init and finalize by @eddy16112 in #275
- Get rid of the LEGATE_MAX_CPU_COMMS by @eddy16112 in #280
- src/core/comm: add simple implementation of pthread_barrier for mac by @rohany in #284
- Add some types needed for work on cunumeric by @bryevdv in #287
- Make the binding facility work on a single node by @magnatelee in #295
- Eliminate null shift transforms by @magnatelee in #300
- Pass the shape by default when creating an accessor by @magnatelee in #301
- Store refactoring by @magnatelee in #302
- Use bounding box to convert the store domain to a rect by @magnatelee in #303
- Broaden add_scalar_arg type by @bryevdv in #306
- Solver fix to correctly handle unaligned color shapes when no unbound stores exist by @magnatelee in #310
- Check if the task returned a buffer to every unbound store by @magnatelee in #312
- Note minimum CTK requirement on runtime requirements by @manopapad in #250
- Remove some unused code in the driver by @manopapad in #254
- Only overwrite installed legion_c_util.h if less recent by @manopapad in #256
- Conda build doesn't need to mess with CUDA stubs by @manopapad in #253
Bug Fixes
- Use phase barriers to work around the CUDA driver bug by @magnatelee in #246
- Add missing includes to aid intellisense providers by @trxcllnt in #249
- Minor type fix by @bryevdv in #252
- Update the launcher to initialize MPI with MPI_THREAD_MULTIPLE by @magnatelee in #264
- Set the GASNet variable for MPI init unconditionally by @magnatelee in #265
- Assign correct dimensions to unbound stores by @magnatelee in #267
- Set cuda virtual package as hard run requirement for gpu conda package by @m3vaz in #266
- Fixes for building with setup.py outside conda, primarily Mac (#260) by @jjwilke in #263
- use mpicc to compile the code when gasnet is enabled by @eddy16112 in #274
- fix the coll for non-mpi case. by @eddy16112 in #291
- StoreTransform needs a virtual destructor by @magnatelee in #304
- Correctly handle 0-D stores backed by region fields by @magnatelee in #315
- src: switch to abolute include path for pthread_barrier.h by @rohany in #311
- Temporarily drop support for UDP conduit by @manopapad in #305
New Contributors
- @trxcllnt made their first contribution in #249
- @eddy16112 made their first contribution in #214
- @jjwilke made their first contribution in #263
- @rohany made their first contribution in #284
Full Changelog: v22.05.03...v22.08.00
v22.05.03
v22.05.02
This hotfix release fixes issues in conda recipes.
What's Changed
- Cherry pick: Freeze Conda Compiler Versions (#261) by @marcinz in #276
- Cherry pick: Set cuda virtual package as hard run requirement for conda gpu package ( #266) by @marcinz in #277
- Fix typo in conda run requirements by @m3vaz in #281
Full Changelog: v22.05.01...v22.05.02