**Some (Probably Useless) Number Theory**

Consider the following pattern and see if you can deduce the next number in the sequence:

25, 29, 85, 89, 145, 42, ...

Everything seems to proceed predictably until that darn 42, doesn't it? How about this next one? It's the same pattern, just with a different starting number!

23, 13, 10, 1, 1, ...

I enjoy trying to solve patterns like this, and will admit that this one stumped me for a long time when I first saw it. Here's one more example of the pattern, with yet another initial value:

2, 4, 16, 37, 58, 89, ...

Maybe you've got it now - let's call it the 'square digits' pattern. It is found by taking an input, squaring each of the digits, and adding them to proceed! Not too difficult to replicate, but a bugger to discover. There is something interesting going on with this pattern: Notice that both 2 and 25 eventually lead into 89 (from which point the patterns will be identical). Two primary questions emerge here:

1) Is this a repeating pattern for every number?

2) How many distinct pattern sequences are there? (e.g. 2 and
25 do **not** form distinct pattern sequences, because as soon
as they both hit 89 they are identical.)

Let's try and answer these questions rigorously. A little arithmetic convinces us that the two distinct patterns (of the three shown above) do repeat. 23 eventually becomes 1, and 1 is a fixed point in the pattern. Thus, we just get:

23, 13, 10, **1**, 1, 1, 1, 1, 1, 1, ...(forever)

The other pattern (perhaps we should call it the 'sequence of 89') also repeats:

25, 29, 85, **89**, **145**, **42**, **20**, **4**, **16**, **37**, **58**, 89, ...

In each example, the numbers in **bold** represent
the complete repetend of this pattern sequence.

Unfortunately, these two patterns do not constitute sufficient evidence that all such pattern sequences repeat, nor are we sure that these are the only two pattern sequences.

**The Punch Line**

Here's the amazing thing about this 'square digits' pattern: *All* pattern
sequences will repeat, and these are the *only two* non-trivial pattern
sequences (the trivial pattern sequence would be 0, 0, 0, ...). This is kinda
cool! Out of the infinite number of natural numbers, these are the only pattern
sequences
that result when the 'square digits' pattern is applied. Let's see
if we can show that this amazing fact is true.

Let's start with an observation: we don't have to check every positive integer. Notice that when we started the 'square digits' pattern with 25, we also learned that 29, 85, 89, 145, 42, 20, 4, 16, 37, and 58 all end up with the 'sequence of 89', Thus, just by observing what will occur later in the pattern, we can focus on fewer numbers to test.

But wait, you say! Even knowing that 10 or so extra numbers
are taken care of by any given number doesn't tell us anything about all
the numbers! Well, let's think of it this way. Suppose we wanted to check
all 1, 2, and
3 digit
numbers
to
see
if they all
follow one of these two pattern sequences. That's all the numbers 1, 2, 3,
...,
997, 998, 999. Well, of this set of nine hundred and ninety-nine numbers,
which one would lead to the largest *second* number if we applied
the 'square digits' pattern? That is, which number from 1 to 999 has the
largest digits
(larger
digits
lead to larger squares, which lead to a larger sum of the squares)? That
would be 999!

999, 243, 29, 85, **89**, ...

It's not too hard to see that 999 will follow the 'sequence
of 89,' but more importantly - look at the numbers after 999 and their relationship
to 999. The key observation is that they are all *less than* 999.
This is our aha! moment. Applying the 'square digits' pattern to large numbers
will result in lower numbers eventually down the line. (Think for a minute
about why this is so.)

What does this mean? It means that in order to find out if all the numbers from 1 - 999 follow one of our two pattern sequences, we only have to check the numbers from 1 - 243 because any number less than 999 will lead to a number less than 243. This is a lot less work, but it still only covers the numbers up to 999. Well, what if we wanted to check all the numbers up to 999,999? Well, that's six 9's and so 6 times 81 is 486. But by checking up to 243, we'll have already checked up to 999 (which would have included 486) and so we'll have effectively checked all the numbers through 999,999. Do you see now where this is going?

Let's see if we can formalize this a bit. If we want to check
all the numbers up to and including those with *k* digits, we really
only need to check those up to 81*k*. The observation we just made
is that if *k* ≥ 3, then 81*k* is less than the *k* digit
number composed entirely of 9's, which saves us a lot of work. The question
remains, though, how much less is it? If it is less than the **k ****-
1** digit number composed entirely of 9's, then we are done!! This
will be because checking the 123 digit numbers, for example, would only
require checking the 122 digit numbers. But to check the 122 digit numbers,
we would only have to check the 121 digit numbers, and so on down until the
four
digit numbers, for which we will see that we only have to check through
324, since 324 equals 4 times 81. (It is true that for *k* = 3, 81*k* <
999, but 81*k *>
99, which would be the two digit number composed entirely of 9's, as *k* =
3 implies *k *- 1 = 2. Hence our need to check through 324 instead
of 243 in the general case.)

So, what we need to prove is that for *k* > 3, the
inequality

holds,
because that is the way (in base 10) that we would get the *k* -
1 digit number composed entirely of 9's. Chew on this for a little bit to
ensure that it makes sense.

We can prove this statement inductively by first proving that
it holds for the smallest case, which is *k* =
4, and then by showing that when the statement is true for *k*, it
is automatically true for * k *+ 1. We'll check that it is true for* k*
= 4 first:

Now, we'll assume that the statement is true for *k*,
and show that it will then be true for *k* + 1:

The question remains, is this inequality true? Well, notice that if we remove the 81 from the left side and the first term on the right side we get:

which we are assuming is true (recall: we assume the statement
is true for *k*). Notice that if *k* > 3, then

and 9000 is definitely larger than 81. So, the smaller side
of the (assumed to be true) inequality is having the smaller thing added
to it (the 81) and the larger side is having the larger thing added to it
(9*10^[*k *- 1]), and so the new inequality is true as well.
This completes our proof by induction.
What
does
this all
mean?
It
means
that
to check all
the positive integers to see if they follow one of our pattern sequences,
we only have to check through 324. This is good news; our proof saved us
an
infinite amount of work.

Now, checking all the numbers through 324 is theoretically easy, but still a pain to do manually. This is where technology comes in - click here if you want the code to a non-elegant Maple program that will will perform these calculations for you, or click here if you just want to see the results. That sums up our proof, every positive integer will follow one of two pattern sequences, either the 'sequence of 89' or the 'sequence of 1,' when the 'square digits' pattern is applied. Cool, eh?

**A Generalization**

Recall the statement "...that is the way (in base 10) that we would..." What possible reason could there be for pointing out that we are in base 10? Well, the way the 'square digits' pattern operates (perhaps we should call it a function?) on any given initial value is entirely dependent on the digits of that initial value. So what? So, the digits used to represent any given number are dependent on the base we are using! Let's look at an example. Let's convert 17 to base 5 and then apply the 'square digits' pattern to each representation. In case you're rusty, to turn 17 into base 5 we look at the highest power of 5 less than 17, count up how many multiples of that power are less than 17, and then proceed to the next lower power of 5. So, 17 = 3 fives + 2 ones = 325 (the little 5 is the way we will denote that we are in base 5).

In base 10: 17, 50, 25, 29, 85, 89 ... (the 'sequence of 89')

In base 5: 325, 235, 235, 235, ... (a 'sequence of 235')

Why did that happen? Why does 235 repeat when 23 doesn't? Well, let's look at the calculations:

This tells us that in base 5, we have a totally new and unique
pattern sequence! This is interesting in its own right, but I think the
really interesting questions are these: In bases other than 10, does every
number
follow a repeating
sequence
for
the
'square
digits'
pattern? and, if so, How many distinct pattern sequences are there for a
given base? Well, one can see after only a little practice that computing
this
pattern,
by
hand, in bases other than 10 is *cumbersome*. Let's turn to Maple
again.

The version of Maple that I use (9.5 for the Mac) has the ability
to convert a base 10 number into another base (2 or greater), but it does
so
in a clunky
sort
of way. It returns a list of integers that represents the various
digits, but it 'thinks' about each of these numbers in the list as a base
10 numeral. Example: the command
convert(17, base, 5); will return the list [2, 3]. This is Maple's way
of
saying 2 ones plus 3 fives, which we would write as 325 (Maple
has the annoying habit of returning these numbers in the reverse order
of the way they are normally displayed). However, if we then try and do
something to the numbers in the list [2, 3], Maple thinks of them as [2
ones, 3 ones].
In
other
words,
as soon
as
the
conversion is completed, Maple *forgets* what the numbers in the
list represent and uses them as it would if it had never done any base
conversion.
(At
least, this is all I've been able to discover about Maple's internal workings.
If you aren't familiar with Maple or Maple programs and would like to learn
more, click here.) So, I wrote a program that,
while admittedly crude, does the calculations for us, but it returns the
numbers
as their
base
10
equivalent, regardless of the
input base. All the internal calculations preserve the new base's pattern
sequences, but the output is in base 10. For example, to compute the above
pattern sequence for 325, the program will return
17, 13, 13, 13, ... and so on.

This little quirk is a pain if we want to indentify precisely what the new pattern sequences for each base are, but not a problem at all if we just want to answer the two questions above (does each pattern eventually repeat? and how many patterns are there?). So, let's get to work!

One might hope that our work regarding the base 10 pattern
sequences would be of some use, and indeed it should be. The diffiulty lies
in getting our heads working in a new base. Let's say we want to see if every
positive integer will result in a finite number of pattern sequences when
represented in base *n* (*n* ≥ 2). Just as 9 is the largest
possible digit in base 10, the largest digit in base *n* will be *n* -
1. Suppose we want to check
all the *k* digit numbers. The *k* digit number with each of
its digits equal to *n* - 1 will be the largest
of these. If we apply the 'square digits' pattern to this number, the next
number
in
the
sequence
will
be:

The question we now have to answer is, what is this quantity less than? Is it less than

as it was in the case of base 10? (Before moving on, make sure you agree that this is the same value we worried about for base 10 numbers.) Let's write out the inequality completely and then rearrange it into some more compact noation:

This new notation, factoring out the (*n* - 1) and using
a summation symbol, not only makes the inequality easier on the eyes, but
it points out that we have a geometric sequence on the right. We have a sequence
of *k* - 1 terms where the 1st term is 1, and each consecutive term
is *n* multiplied by the previous term. This is good news because
we know how to sum
up a geometric sequence, either finite or infinite. Applying this bit
of knowledge, we rewrite the inequality to be:

This simplifies to be:

Now we are getting somewhere! So now, if we are working in
base *n* and we find a value of *k *for which this inequality
holds, will we be done? Almost. Our previous inductive proof was a specific
case of this inequality, *n* = 10, so we still have to prove something
generally. This is what we'll do - for a fixed value of *n* (the base
we are working in), we will assume that this inequality will hold for *k* (the
number of digits) and we will try and show that it will then also hold for *k* +
1. If we do that, then for each value of *n* we just have to find
a specific *k* that
it holds for (which was 4 for *n* = 10, if you recall) and run
a modified Maple program to test up through the value on the left hand side
of the inequality (324 = 4*81 in the base 10 example). Let's
do it.

Well, this doesn't look like our previous induction proof at all. In fact, all is not lost, (although, if you want to follow the previous induction steps directly, try this proof without rewriting in summation notation or finding the explicit sum for the geometric sequence). Let's think of it this way - to get to here, what have we added to the assumed true inequality? On the left hand side we've added:

while on the right had side we must have added (think about this one)

because our assumed true inequality was

So which one is bigger? The left hand side of the assumed true inequality is the smaller side, so if the piece we added to the left hand side is no bigger than the piece we added to the right hand side, then the new inequality will definitely be true and our generalized induction step will be complete. Is that the case? In other words, is the following statement true?

Let's factor the right hand side and rewrite the left hand side, and then divide by common terms to get

Remember
that *n* is a base, and so it is a positive integer of at least 2,
so division causes us no problems. So ask yourself, is a positive integer
subtract 1 smaller than that same
positive integer raised to a power (where that power is also a positive integer)?
Well, the answer is pretty obvious. The only time the right hand side wouldn't
be bigger is if *k* = 1. This just means that we will always have
to check at least the one digit numbers in any base (not too hard).

This has been a long way around to say that one of our conclusions
from base 10 generalizes. That is, every positive integer in any base will
follow one of a finite number of pattern sequences (because the big numbers
will still be eventually followed by smaller numbers as the digits are squared
and summed). In fact, to find out how many, recall that we just need to choose
an *n*,
then find a *k* for
which

holds, and then check all the numbers up to

**What's the catch?**

If you looked at the Maple program I wrote to compute the 'square
digits' pattern in base *n*, you noticed a parameter called len. len
told Maple how many times to apply the pattern and then return the list with
that many (+ 1 for the original number) entries. In the base 10 program,
this paramter was unnecessary because every positive integer eventually ended
in either the 'sequence of 1' or the 'sequence of 89' and so Maple could
stop calculating when it found one of these numbers. In base *n*,
we don't know what sequences the positive integers will end up in, especially
as we input different values for *n*. Thus, I wrote the program
to simply carry out the calculations a predetermined number of times, len
number of times to be precise. This is
dangerous,
because
if
we don't run the pattern far enough, we may not be able to tell if we have
a repeating pattern sequence or not. This means that if we try and count
the number of pattern sequences for a given base, we have to be careful about
how often we apply the pattern to a given number before we conclude that
we have a new pattern sequence - it may just be that we haven't computed
far enough!! If we don't compute far enough, we may think we have a
new pattern when, in reality, we just haven't gotten to a recognizable pattern
yet. This can result in us thinking that there are more pattern sequences
than there really are. Consider the following example: I wrote another Maple program that
attempts to count the number of distinct pattern sequences for any base* n*
using the previous two programs and the len parameter.
I ran this program on all the bases
from 2 to 30 using different values for len. Look at the different number
of pattern sequences the program identified:

len = 10: [1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 4, 10, 2, 10, 7, 8, 4, 15, 8, 18, 7, 9, 7, 15, 12, 11, 11]

len = 15: [1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 4, 10, 2, 9, 6, 7, 3, 13, 6, 15, 5, 9, 3, 13, 9, 9, 7]

len = 20: [1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 3, 10, 2, 9, 6, 6, 2, 13, 5, 15, 5, 9, 3, 12, 7, 9, 6]

len = 25: [1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 3, 10, 2, 9, 6, 6, 2, 13, 5, 15, 5, 9, 3, 12, 7, 9, 6]

Notice how drastic some of the differences are in the number
of pattern sequences the program found. As the value for len increased, the
number of patterns for some bases decreased. Even though the lists for len
= 20 and len = 25 are identical, it's *possible* that further increasing
len would alter these numbers, or that reusing these values of len
for
other bases would yield a different number of sequences for those bases.

How can we remove this ambiguity? How can we know for sure that we've computed the sequence long enough to verify that we have a new pattern instead of simply a number that wanders around awhile before arriving at a previously discovered pattern sequence? Well, we do know one thing - if we only have to check up to

then there must be no more than that many pattern sequences, right? In base 10, we knew right away that there couldn't be any more than 324 pattern sequences, because we knew that by checking up through 324 we'd have checked all the positive integers. What does this have to do with how far we have to check each number? One thing it tells us for sure is that if we find more pattern sequences than allowed by this value, we need to apply the 'square digits' pattern further to identify the imposters.

We may try and approach the problem another way: Remember when I said that we might think of the 'square digits' pattern as a function? Well, let's do that now. What we have is a function, say

for a *k* digit number. That is, we treat
the various powers of 10 as 'basis vectors' and we write out any input number
in terms of those 'basis vectors' and see that the function is simply the
sum
of the squares of the coefficients that make up the number we're interested
in. If we assume that we are writing each number in base 10, then each coefficient *ai* is
in the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, which, unfortunately, does not
form a
field, and so we don't
have an actual vector
space in the strictest sense, which is really too bad because the quantity
on the right looks a lot like a formula for a norm.

Nevertheless, we can look at it in a (hopefully) productive manner. Since we only have to check the numbers up to

we might ask the question:

In other words, how far can a number 'bounce around' before
arriving at one of the finite number of pattern sequences? If the answer
to the above question involving *j* is yes, then it can at most bounce around
for

numbers (minus, of course, the number of patterns we have previously
discovered). Just so we can keep a grip on this, let's think about the specific
example of base 10. We know that we only have to check through 324 because
of our induction proof and the fact that *f*(999) = 324. So, let's
look at *f*(324) = 29. This is good
news, because 29 < 324. We might be tempted to ask, is *f*(*x*)
< *x* always? Sadly no, as we will see if we check *f*(*f*(324))
= *f*(29) = 85. Nevertheless, 85 < 324, so things have not totally
gone awry. But, will this keep happening? If we plug any number less than
324 into *f*, will the output be less than 324? Well, what number
less than or equal to 324 will have the largest output value for* f*?
It would seem to be the number with the most digits possible, the most 9's
possible in those digits, and the largest numbers possible for any remaining
digits. That describes the number 299 perfectly. So, what is *f*(299)?
It's 4 + 81 + 81 = 166 < 324. Yeah!! That means that any number in base
10 could *theoretically* only bounce around for 324 numbers before
arriving at one of the pattern sequences.

Does a similar result hold for base *n*? Well, suppose
we have found a value for *k* for which

is true. Then, we need to know which number less than or equal
to the left hand side will have the largest output value in *f*. That's
a tricky question, because we are not sure how many digits the number on
the left is. I can show, though it doesn't seem to be helpful, that if *k* is
less than or equal to half of *n*, then it will have 3 digits. Other
than that, I'm temporarily stuck trying to find a particular
value for len that will guarantee that we have applied *f* enough
times. I would like to close
this little loophole, and would welcome any help!

Nevertheless, there is a slicker programming method, though
it is not quite as mathematically satisfying. Recall that in the program
for
base
10,
no
len parameter existed. Why is that? Well, the hypothesis for base 10 was
that each number would end up in one of 2 known pattern sequences, and that
they would repeat from there. In other words, if I got to 89 (or 1) in my
calculations, I didn't have to proceed any further because I knew the full
'sequence of 89' (or 'sequence of 1'). For a given base *n*, I may
not know at the beginning of my calculations how many pattern sequences I
am trying to find, but I do know that if I find one that it will be finite
in length. In other words, every pattern sequence will begin to repeat and
if I carry my calculations until I see where it begins to repeat, I've carried
it far enough (though I don't know a specific length for how far that needs
to be). How do I know that each pattern is of finite length, or that it repeats?
Well, let's say I have a number *x* that I begin to apply *f* to.
What I am claiming is that

for some *j* < *k* (it's obviously true if *j* is
equal to *k*). Why is this? Let's suppose
to the contrary, that each time I apply *f* I will get a totally new
number that has not occured before. One can see that this would require these
new numbers to be arbitrarily large. (e.g. If I got a new number each time
I applied *f*, with the millionth application of *f* the output
would have to be at least 1 million.) Well, this can't happen, because we've
already shown that when you apply *f* to large numbers you get smaller
numbers eventually down the line. So each pattern sequence must be finite,
and I can rewrite each Maple program so that they will simply keep going
until they find a repeating pattern. That way we eliminate our problem of
overestimating
and will get the exact number of pattern sequences in any base for the 'square
digits' pattern. Here they are again, after implementing these new programs:

[1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 3, 10, 2, 9, 6, 6, 2, 13, 5, 15, 5, 9, 2, 12, 7, 9, 5]

Notice that there are a couple of differences from the len = 25 list above, for bases 26 and 30.

**Another Generalization**

One may ask, "What is the significance of squaring each digit
in the pattern sequence? Do the results change if a power other than 2 is
used?" Those are excellent questions, and we will now begin to address them.
The basic question is this - For a given power *p* (let's restrict
ourselves to positive integers greater than or equal to 2, shall we?) and
a base *n*,
will each positive integer follow one of a finite number of pattern sequences?
and If so, how many pattern sequences will occur when the '*p-*digits'
pattern is applied? Note that we can no longer call our pattern the '*square* digits'
pattern.) Look at what we've gotten ourselves into now!

To answer the first question - will each positive integer follow
one of a finite number of pattern sequences? - we need to look at some calculations
similar to those that we've done before. If we are in base *n*, using
a power *p*, what will happen to the *k*th digit number when
the '*p*-digits' pattern is applied? We will get

The question is, what is this value less than? Is it less than

like it was before, or did the substitution of *p* for
2 screw things up completely? This might be a good place to use the same
simplification procedures we used earlier:

Like before, we need to prove that this inequality is true
for large enough *k* when *n* and *p* are fixed. If
we do that, we'll have proved that any positive integer in any base will
follow one of a finite number of sequences for the '*p*-digits' pattern
for any value of *p*. Again, we work by induction - meaning that we'll
assume the above inequality is true for a choice of *n*, *p*,
and arbitrary *k* and we'll show that it must therefore be true for *n*, *p*,
and *k* + 1. If we look at the inequality for *k* + 1 we get:

Just like before, we are essentially asking the question, is -

This is why finding the explicit sum for the sequence is handy,
and why I said we need to prove the inequality for large enough *k*.
Clearly, sometimes this inequality will be true and sometimes it won't, depending
on the respective values for *p* and *k*. Remember, though, *p* is
a fixed value. If we can find values of *k* large enough to make this
true, then we are in business (not too hard once we decide on *p*).
So, here are some generalized Maple programs to incorporate different *p* values
into the pattern.

How does this change all of our results? Below are some results
of the generalized programs run with varying values for *p* and *n* (this
is *very* computationally
intensive, even for Maple).
They are arranged in a matrix - so, if you want to know the program's calculation
of how many patterns there are for base *n* with power *p*,
you will look at the entry in the *n* - 1 column and the *p* -
1 row. Thus, base 10 with *p* = 2 is found in row 1, column 9 (the
entry is 2, as we already know). I've only computed from bases 2 to 10
with *p* ranging from 2 to 4 because the program is so computationally
intensive. However, feel free to use the
programs to perform more calculations on your own system if you'd like, or
better yet, write some more efficient programs
(I have not spent much time optimizing these programs; I'm sure there could
be better versions). Maybe you'll be able to speed up the computations enough
that we can extend this matrix and discover new and more interesting patterns!