Why Conv2DBackpropInputOp is lowered to mhlo.reverse&mhlo.conv instead of a lib call directly?

Hi all, In legalize_tf.cc, tf.Conv?DBackpropInputOp is legalized to mhlo.reverse + mhlo.convolution. cudnnConvolutionBackwardFilter is cudnn api which should outperform mhlo.reverse + mhlo.convolution.
I made a UT and tested locally. The result shows cudnnConvolutionBackwardFilter is ~1.13X faster than mhlo.reverse + mhlo.convolution. Why not use lib call directly? any particular consideration? or just for general purpose?

GPUs are not our only target, nor is CUDA-supported the only GPUs we care about. The largest user of the bridge internally is TPUs. It will be a good question once we’ve gotten to doing more aggressive codegen if that delta stays the same (and in which case we can either pattern match or change conversion target for GPUs) or if the additional opportunities exposed by having a transparent computation would result in better end-to-end performance even if worse looked at in isolation.

1 Like

Thank you for your explanation.