Skip to content

get latest #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 19, 2019
Merged

get latest #13

merged 5 commits into from
Feb 19, 2019

Conversation

ankan-ban
Copy link
Owner

No description provided.

ankan-ban and others added 5 commits February 14, 2019 16:54
* misc changes to cudnn backend

- replace all cudaMemcpyAsync used for loading weights with cudaMemcpy as  source (in CPU memory) could be deleted before the async version of the function actually does the copy.
- minor naming/style changes.
- add comment explaining what the policy map layer does and how the layout conversion from CHW to HWC works.

* fix typo in comment

* clang-format

* address review comment
* minor performance fixes

~5-10% improvement in CPU limited cases (tested with 32x4 network on GTX 970)

* handle 64 filter SE networks

 - need numFc1Out of 16

* Update params.cc

* fix diff

* fix whitespace
* update copyright

* more copyright updates
* remove CUDNN_ACTIVATION_IDENTITY

* support cudnn 7.0

* simplify cudnn 7.0 path

* cudnn correctness fix
@ankan-ban ankan-ban merged commit 8f46984 into ankan-ban:master Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants