-
Notifications
You must be signed in to change notification settings - Fork 3k
WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080
Conversation
c8cee60 to
d62e4d3
Compare
d62e4d3 to
2e42899
Compare
0f383e0 to
a09364c
Compare
|
@itikhono may I ask you to review transformation changes? |
src/plugins/intel_cpu/tests/functional/subgraph_tests/src/arm/matmul_compress_convert.cpp
Outdated
Show resolved
Hide resolved
|
LGTM from tests side |
3bcbe16 to
aaa63c7
Compare
| } | ||
| }; | ||
|
|
||
| class TRANSFORMATIONS_API Compression : public RuntimeAttribute { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please provide in comment with explanation why we need this rt_info? Fron which it will be clear why we cannot use the existing ones a need a new rt_info
| Compression() = default; | ||
|
|
||
| bool visit_attributes(AttributeVisitor& visitor) override { | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary to store this rt_info to IR?
| copy_runtime_info(incoming_node, convert); | ||
| input.replace_source_output(convert); | ||
| disable_fp16_compression(convert); | ||
| mark_as_compression(convert); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converts are decompression converts: they upcast to fp32 for precision sensitive subgraphs. Is it possible to rename mark_as_compression to avoid confusions?
Since mark_as_compression is used only for converts that are inserted to align types for f16 and f32 parts, can we name it e.g. mark_type_aligning_convert to avoid confusion?
aaa63c7 to
0e48570
Compare
0e48570 to
1b28c78
Compare
|
This PR will be closed in a week because of 2 weeks of no activity. |
|
This PR was closed because it has been stalled for 2 week with no activity. |
Details:
The PR disables weights compression fp32->fp16 on the ngraph side and moves them to the CPU plug-in in fp32 precision. It allows us to improve memory consumption on ARM64 platforms. This change only affects MatMul nodes
PR to oneDNN fork: openvinotoolkit/oneDNN#220
Tickets: