Sunday 12 September 2010

BSDNT - v0.6 addmul1, submul1

At last we come to addmul1 and submul1. After implementing these, we would
already be able to implement a full classical multiplication routine, using the
naive O(n^2) multiplication algorithm.

It turns out that addmul1 and submul1 are quite similar to mul1. Instead of
just writing out the result, we have to add it to, or subtract it from the
first operand.

These are really combined operations, i.e. combined mul1 and add_m or sub_m
respectively.

The reason we introduce these is that the processor has multiple pipelines
and merging two operations like this gives us the chance to push more
through those pipelines. We'll add more combined operations later on.

The fact that we can add b[i]*c, a[i] and ci and not overflow a double word
needs some justification.

Clearly b[i] and c are at most B-1. Thus b[i]*c is at most B^2-2B+1. And
clearly a[i] and ci are at most B-1. Thus the total is at most
B^2-2B+1 + (B-1) + (B-1) = B^2-1, which just fits into a double word.

The dword_t type allows us to get away with adding all these in C as though
we don't care about any overflow (in fact we know there is none). Of course
that doesn't make it efficient. This function is still a candidate for
assembly optimisation and loop unrolling later on.

The submul function is essentially the same, so long as we have taken care to
perform operations in the right order.

The naming of our functions causes us slight difficulties here. We'd ideally
like versions of addmul1 and submul1 which write the carry out to the high
limb and versions which add/sub it from the high limb. We opt for the latter
for now. After all, if one were accumulating addmuls, this is what one would
require most often.

Maybe someone reading this has a better idea how to handle these.

We add a test which checks that doing two addmuls in a row is the same as
doing a single addmul with the multiply constant equal to the sum of the
original two. We also add a test to check that chaining addmuls works. We
add similar tests for submul.

We delay coding up a quadratic multiplication basecase as there are a few more
linear functions to work on, most notably the various kinds of division and
remainder functions. These are all interesting functions to write.

The github repo for this version is here: v0.6

Previous article: v0.5 add, sub, cmp
Next article: v0.7 divrem1_simple

No comments:

Post a Comment