We are looking at implementing padding support for lowering an
aten.conv2d operation through
tcf.conv2d_nchw and wanted to revisit our initial discussion about adding a
pad operation to the reference backend.
Currently we are using the following lowering pattern for basic conv support.
aten.conv2d(input, filter, bias, stride, padding, dilations, groups) // -convert-aten-to-tcf tcf.conv2d_nchw(input, filter, bias, stride, padding, dilations, groups) // -convert-tcf-to-linalg // bunch of shape-related error-handling linalg.conv_2d_nchw(...)
Looking back at our original npcomp conv2d discussion, we had initially discussed adding a
tcp.pad op (if padding != 0) that does a deepcopy into a padded buffer with the correct dimensions and then passes the whole thing off to
linalg.conv_* (that way the linalg ops only need to know how to handle VALID padding). So the lowering flow for a conv2d would now look like this.
tcf.conv2d(input, filter, bias, stride, dilations, padding) // --convert-tcf-to-linalg // if padding != 0 %0 = tcp.pad(input, padding, fill_value/*=0*/) %1 = linalg.conv2d_nchw(%0, ...) // else %0 = linalg.conv2d_nchw(input)
With this approach the lowering of tcp.pad would lower to something like:
tcp.pad(input, padding, fill_value) // convert-tcp-to-std // get the shape of padded buffer %pad_buf = std.alloc(sizeof(padded_buf)) %1 = std::fill(%pad_buf, input, <starting address>) // fill with subview at the right location %2 = linalg.conv_2d_nchw(%1,...)
A few points for discussion:
- Does this still sound like a good first-pass approach?
- TCP is a mostly unused dialect currently, do we still think this is the right place to add a PadOp, or should we just add to TCF and then change over to an upstream linalg lowering when there is one?
- Will lowering a PadOp to our own implementation that
alloc's break our current usage of the upstream bufferization passes? It looks like the TCPBufferize logic has still stuck around for the SplatOp, but also maybe there’s a way to express the padding operation without having to allocate a constant pad buffer to move data into. Maybe we can reuse the