Some (Probably Useless) Number Theory

Consider the following pattern and see if you can deduce the next number in the sequence:

25, 29, 85, 89, 145, 42, ...

Everything seems to proceed predictably until that darn 42, doesn't it? How about this next one? It's the same pattern, just with a different starting number!

23, 13, 10, 1, 1, ...

I enjoy trying to solve patterns like this, and will admit that this one stumped me for a long time when I first saw it. Here's one more example of the pattern, with yet another initial value:

2, 4, 16, 37, 58, 89, ...

Maybe you've got it now - let's call it the 'square digits' pattern. It is found by taking an input, squaring each of the digits, and adding them to proceed! Not too difficult to replicate, but a bugger to discover. There is something interesting going on with this pattern: Notice that both 2 and 25 eventually lead into 89 (from which point the patterns will be identical). Two primary questions emerge here:

1) Is this a repeating pattern for every number?

2) How many distinct pattern sequences are there? (e.g. 2 and 25 do not form distinct pattern sequences, because as soon as they both hit 89 they are identical.)

Let's try and answer these questions rigorously. A little arithmetic convinces us that the two distinct patterns (of the three shown above) do repeat. 23 eventually becomes 1, and 1 is a fixed point in the pattern. Thus, we just get:

23, 13, 10, 1, 1, 1, 1, 1, 1, 1, ...(forever)

The other pattern (perhaps we should call it the 'sequence of 89') also repeats:

25, 29, 85, 89, 145, 42, 20, 4, 16, 37, 58, 89, ...

In each example, the numbers in bold represent the complete repetend of this pattern sequence.

Unfortunately, these two patterns do not constitute sufficient evidence that all such pattern sequences repeat, nor are we sure that these are the only two pattern sequences.

The Punch Line

Here's the amazing thing about this 'square digits' pattern: All pattern sequences will repeat, and these are the only two non-trivial pattern sequences (the trivial pattern sequence would be 0, 0, 0, ...). This is kinda cool! Out of the infinite number of natural numbers, these are the only pattern sequences that result when the 'square digits' pattern is applied. Let's see if we can show that this amazing fact is true.

Let's start with an observation: we don't have to check every positive integer. Notice that when we started the 'square digits' pattern with 25, we also learned that 29, 85, 89, 145, 42, 20, 4, 16, 37, and 58 all end up with the 'sequence of 89', Thus, just by observing what will occur later in the pattern, we can focus on fewer numbers to test.

But wait, you say! Even knowing that 10 or so extra numbers are taken care of by any given number doesn't tell us anything about all the numbers! Well, let's think of it this way. Suppose we wanted to check all 1, 2, and 3 digit numbers to see if they all follow one of these two pattern sequences. That's all the numbers 1, 2, 3, ..., 997, 998, 999. Well, of this set of nine hundred and ninety-nine numbers, which one would lead to the largest second number if we applied the 'square digits' pattern? That is, which number from 1 to 999 has the largest digits (larger digits lead to larger squares, which lead to a larger sum of the squares)? That would be 999!

999, 243, 29, 85, 89, ...

It's not too hard to see that 999 will follow the 'sequence of 89,' but more importantly - look at the numbers after 999 and their relationship to 999. The key observation is that they are all less than 999. This is our aha! moment. Applying the 'square digits' pattern to large numbers will result in lower numbers eventually down the line. (Think for a minute about why this is so.)

What does this mean? It means that in order to find out if all the numbers from 1 - 999 follow one of our two pattern sequences, we only have to check the numbers from 1 - 243 because any number less than 999 will lead to a number less than 243. This is a lot less work, but it still only covers the numbers up to 999. Well, what if we wanted to check all the numbers up to 999,999? Well, that's six 9's and so 6 times 81 is 486. But by checking up to 243, we'll have already checked up to 999 (which would have included 486) and so we'll have effectively checked all the numbers through 999,999. Do you see now where this is going?

Let's see if we can formalize this a bit. If we want to check all the numbers up to and including those with k digits, we really only need to check those up to 81k. The observation we just made is that if k ≥ 3, then 81k is less than the k digit number composed entirely of 9's, which saves us a lot of work. The question remains, though, how much less is it? If it is less than the k - 1 digit number composed entirely of 9's, then we are done!! This will be because checking the 123 digit numbers, for example, would only require checking the 122 digit numbers. But to check the 122 digit numbers, we would only have to check the 121 digit numbers, and so on down until the four digit numbers, for which we will see that we only have to check through 324, since 324 equals 4 times 81. (It is true that for k = 3, 81k < 999, but 81k > 99, which would be the two digit number composed entirely of 9's, as k = 3 implies k - 1 = 2. Hence our need to check through 324 instead of 243 in the general case.)

So, what we need to prove is that for k > 3, the inequality

holds, because that is the way (in base 10) that we would get the k - 1 digit number composed entirely of 9's. Chew on this for a little bit to ensure that it makes sense.

We can prove this statement inductively by first proving that it holds for the smallest case, which is k = 4, and then by showing that when the statement is true for k, it is automatically true for k + 1. We'll check that it is true for k = 4 first:

Now, we'll assume that the statement is true for k, and show that it will then be true for k + 1:

The question remains, is this inequality true? Well, notice that if we remove the 81 from the left side and the first term on the right side we get:

which we are assuming is true (recall: we assume the statement is true for k). Notice that if k > 3, then

and 9000 is definitely larger than 81. So, the smaller side of the (assumed to be true) inequality is having the smaller thing added to it (the 81) and the larger side is having the larger thing added to it (9*10^[k - 1]), and so the new inequality is true as well. This completes our proof by induction. What does this all mean? It means that to check all the positive integers to see if they follow one of our pattern sequences, we only have to check through 324. This is good news; our proof saved us an infinite amount of work.

Now, checking all the numbers through 324 is theoretically easy, but still a pain to do manually. This is where technology comes in - click here if you want the code to a non-elegant Maple program that will will perform these calculations for you, or click here if you just want to see the results. That sums up our proof, every positive integer will follow one of two pattern sequences, either the 'sequence of 89' or the 'sequence of 1,' when the 'square digits' pattern is applied. Cool, eh?

A Generalization

Recall the statement "...that is the way (in base 10) that we would..." What possible reason could there be for pointing out that we are in base 10? Well, the way the 'square digits' pattern operates (perhaps we should call it a function?) on any given initial value is entirely dependent on the digits of that initial value. So what? So, the digits used to represent any given number are dependent on the base we are using! Let's look at an example. Let's convert 17 to base 5 and then apply the 'square digits' pattern to each representation. In case you're rusty, to turn 17 into base 5 we look at the highest power of 5 less than 17, count up how many multiples of that power are less than 17, and then proceed to the next lower power of 5. So, 17 = 3 fives + 2 ones = 325 (the little 5 is the way we will denote that we are in base 5).

In base 10: 17, 50, 25, 29, 85, 89 ... (the 'sequence of 89')

In base 5: 325, 235, 235, 235, ... (a 'sequence of 235')

Why did that happen? Why does 235 repeat when 23 doesn't? Well, let's look at the calculations:

This tells us that in base 5, we have a totally new and unique pattern sequence! This is interesting in its own right, but I think the really interesting questions are these: In bases other than 10, does every number follow a repeating sequence for the 'square digits' pattern? and, if so, How many distinct pattern sequences are there for a given base? Well, one can see after only a little practice that computing this pattern, by hand, in bases other than 10 is cumbersome. Let's turn to Maple again.

The version of Maple that I use (9.5 for the Mac) has the ability to convert a base 10 number into another base (2 or greater), but it does so in a clunky sort of way. It returns a list of integers that represents the various digits, but it 'thinks' about each of these numbers in the list as a base 10 numeral. Example: the command convert(17, base, 5); will return the list [2, 3]. This is Maple's way of saying 2 ones plus 3 fives, which we would write as 325 (Maple has the annoying habit of returning these numbers in the reverse order of the way they are normally displayed). However, if we then try and do something to the numbers in the list [2, 3], Maple thinks of them as [2 ones, 3 ones]. In other words, as soon as the conversion is completed, Maple forgets what the numbers in the list represent and uses them as it would if it had never done any base conversion. (At least, this is all I've been able to discover about Maple's internal workings. If you aren't familiar with Maple or Maple programs and would like to learn more, click here.) So, I wrote a program that, while admittedly crude, does the calculations for us, but it returns the numbers as their base 10 equivalent, regardless of the input base. All the internal calculations preserve the new base's pattern sequences, but the output is in base 10. For example, to compute the above pattern sequence for 325, the program will return 17, 13, 13, 13, ... and so on.

This little quirk is a pain if we want to indentify precisely what the new pattern sequences for each base are, but not a problem at all if we just want to answer the two questions above (does each pattern eventually repeat? and how many patterns are there?). So, let's get to work!

One might hope that our work regarding the base 10 pattern sequences would be of some use, and indeed it should be. The diffiulty lies in getting our heads working in a new base. Let's say we want to see if every positive integer will result in a finite number of pattern sequences when represented in base n (n ≥ 2). Just as 9 is the largest possible digit in base 10, the largest digit in base n will be n - 1. Suppose we want to check all the k digit numbers. The k digit number with each of its digits equal to n - 1 will be the largest of these. If we apply the 'square digits' pattern to this number, the next number in the sequence will be:

The question we now have to answer is, what is this quantity less than? Is it less than

as it was in the case of base 10? (Before moving on, make sure you agree that this is the same value we worried about for base 10 numbers.) Let's write out the inequality completely and then rearrange it into some more compact noation:

This new notation, factoring out the (n - 1) and using a summation symbol, not only makes the inequality easier on the eyes, but it points out that we have a geometric sequence on the right. We have a sequence of k - 1 terms where the 1st term is 1, and each consecutive term is n multiplied by the previous term. This is good news because we know how to sum up a geometric sequence, either finite or infinite. Applying this bit of knowledge, we rewrite the inequality to be:

This simplifies to be:

Now we are getting somewhere! So now, if we are working in base n and we find a value of k for which this inequality holds, will we be done? Almost. Our previous inductive proof was a specific case of this inequality, n = 10, so we still have to prove something generally. This is what we'll do - for a fixed value of n (the base we are working in), we will assume that this inequality will hold for k (the number of digits) and we will try and show that it will then also hold for k + 1. If we do that, then for each value of n we just have to find a specific k that it holds for (which was 4 for n = 10, if you recall) and run a modified Maple program to test up through the value on the left hand side of the inequality (324 = 4*81 in the base 10 example). Let's do it.

Well, this doesn't look like our previous induction proof at all. In fact, all is not lost, (although, if you want to follow the previous induction steps directly, try this proof without rewriting in summation notation or finding the explicit sum for the geometric sequence). Let's think of it this way - to get to here, what have we added to the assumed true inequality? On the left hand side we've added:

while on the right had side we must have added (think about this one)

because our assumed true inequality was

So which one is bigger? The left hand side of the assumed true inequality is the smaller side, so if the piece we added to the left hand side is no bigger than the piece we added to the right hand side, then the new inequality will definitely be true and our generalized induction step will be complete. Is that the case? In other words, is the following statement true?

Let's factor the right hand side and rewrite the left hand side, and then divide by common terms to get

Remember that n is a base, and so it is a positive integer of at least 2, so division causes us no problems. So ask yourself, is a positive integer subtract 1 smaller than that same positive integer raised to a power (where that power is also a positive integer)? Well, the answer is pretty obvious. The only time the right hand side wouldn't be bigger is if k = 1. This just means that we will always have to check at least the one digit numbers in any base (not too hard).

This has been a long way around to say that one of our conclusions from base 10 generalizes. That is, every positive integer in any base will follow one of a finite number of pattern sequences (because the big numbers will still be eventually followed by smaller numbers as the digits are squared and summed). In fact, to find out how many, recall that we just need to choose an n, then find a k for which

holds, and then check all the numbers up to

What's the catch?

If you looked at the Maple program I wrote to compute the 'square digits' pattern in base n, you noticed a parameter called len. len told Maple how many times to apply the pattern and then return the list with that many (+ 1 for the original number) entries. In the base 10 program, this paramter was unnecessary because every positive integer eventually ended in either the 'sequence of 1' or the 'sequence of 89' and so Maple could stop calculating when it found one of these numbers. In base n, we don't know what sequences the positive integers will end up in, especially as we input different values for n. Thus, I wrote the program to simply carry out the calculations a predetermined number of times, len number of times to be precise. This is dangerous, because if we don't run the pattern far enough, we may not be able to tell if we have a repeating pattern sequence or not. This means that if we try and count the number of pattern sequences for a given base, we have to be careful about how often we apply the pattern to a given number before we conclude that we have a new pattern sequence - it may just be that we haven't computed far enough!! If we don't compute far enough, we may think we have a new pattern when, in reality, we just haven't gotten to a recognizable pattern yet. This can result in us thinking that there are more pattern sequences than there really are. Consider the following example: I wrote another Maple program that attempts to count the number of distinct pattern sequences for any base n using the previous two programs and the len parameter. I ran this program on all the bases from 2 to 30 using different values for len. Look at the different number of pattern sequences the program identified:

len = 10: [1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 4, 10, 2, 10, 7, 8, 4, 15, 8, 18, 7, 9, 7, 15, 12, 11, 11]

len = 15: [1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 4, 10, 2, 9, 6, 7, 3, 13, 6, 15, 5, 9, 3, 13, 9, 9, 7]

len = 20: [1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 3, 10, 2, 9, 6, 6, 2, 13, 5, 15, 5, 9, 3, 12, 7, 9, 6]

len = 25: [1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 3, 10, 2, 9, 6, 6, 2, 13, 5, 15, 5, 9, 3, 12, 7, 9, 6]

Notice how drastic some of the differences are in the number of pattern sequences the program found. As the value for len increased, the number of patterns for some bases decreased. Even though the lists for len = 20 and len = 25 are identical, it's possible that further increasing len would alter these numbers, or that reusing these values of len for other bases would yield a different number of sequences for those bases.

How can we remove this ambiguity? How can we know for sure that we've computed the sequence long enough to verify that we have a new pattern instead of simply a number that wanders around awhile before arriving at a previously discovered pattern sequence? Well, we do know one thing - if we only have to check up to

then there must be no more than that many pattern sequences, right? In base 10, we knew right away that there couldn't be any more than 324 pattern sequences, because we knew that by checking up through 324 we'd have checked all the positive integers. What does this have to do with how far we have to check each number? One thing it tells us for sure is that if we find more pattern sequences than allowed by this value, we need to apply the 'square digits' pattern further to identify the imposters.

We may try and approach the problem another way: Remember when I said that we might think of the 'square digits' pattern as a function? Well, let's do that now. What we have is a function, say

for a k digit number. That is, we treat the various powers of 10 as 'basis vectors' and we write out any input number in terms of those 'basis vectors' and see that the function is simply the sum of the squares of the coefficients that make up the number we're interested in. If we assume that we are writing each number in base 10, then each coefficient ai is in the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, which, unfortunately, does not form a field, and so we don't have an actual vector space in the strictest sense, which is really too bad because the quantity on the right looks a lot like a formula for a norm.

Nevertheless, we can look at it in a (hopefully) productive manner. Since we only have to check the numbers up to

we might ask the question:

In other words, how far can a number 'bounce around' before arriving at one of the finite number of pattern sequences? If the answer to the above question involving j is yes, then it can at most bounce around for

numbers (minus, of course, the number of patterns we have previously discovered). Just so we can keep a grip on this, let's think about the specific example of base 10. We know that we only have to check through 324 because of our induction proof and the fact that f(999) = 324. So, let's look at f(324) = 29. This is good news, because 29 < 324. We might be tempted to ask, is f(x) < x always? Sadly no, as we will see if we check f(f(324)) = f(29) = 85. Nevertheless, 85 < 324, so things have not totally gone awry. But, will this keep happening? If we plug any number less than 324 into f, will the output be less than 324? Well, what number less than or equal to 324 will have the largest output value for f? It would seem to be the number with the most digits possible, the most 9's possible in those digits, and the largest numbers possible for any remaining digits. That describes the number 299 perfectly. So, what is f(299)? It's 4 + 81 + 81 = 166 < 324. Yeah!! That means that any number in base 10 could theoretically only bounce around for 324 numbers before arriving at one of the pattern sequences.

Does a similar result hold for base n? Well, suppose we have found a value for k for which

is true. Then, we need to know which number less than or equal to the left hand side will have the largest output value in f. That's a tricky question, because we are not sure how many digits the number on the left is. I can show, though it doesn't seem to be helpful, that if k is less than or equal to half of n, then it will have 3 digits. Other than that, I'm temporarily stuck trying to find a particular value for len that will guarantee that we have applied f enough times. I would like to close this little loophole, and would welcome any help!

Nevertheless, there is a slicker programming method, though it is not quite as mathematically satisfying. Recall that in the program for base 10, no len parameter existed. Why is that? Well, the hypothesis for base 10 was that each number would end up in one of 2 known pattern sequences, and that they would repeat from there. In other words, if I got to 89 (or 1) in my calculations, I didn't have to proceed any further because I knew the full 'sequence of 89' (or 'sequence of 1'). For a given base n, I may not know at the beginning of my calculations how many pattern sequences I am trying to find, but I do know that if I find one that it will be finite in length. In other words, every pattern sequence will begin to repeat and if I carry my calculations until I see where it begins to repeat, I've carried it far enough (though I don't know a specific length for how far that needs to be). How do I know that each pattern is of finite length, or that it repeats? Well, let's say I have a number x that I begin to apply f to. What I am claiming is that

for some j < k (it's obviously true if j is equal to k). Why is this? Let's suppose to the contrary, that each time I apply f I will get a totally new number that has not occured before. One can see that this would require these new numbers to be arbitrarily large. (e.g. If I got a new number each time I applied f, with the millionth application of f the output would have to be at least 1 million.) Well, this can't happen, because we've already shown that when you apply f to large numbers you get smaller numbers eventually down the line. So each pattern sequence must be finite, and I can rewrite each Maple program so that they will simply keep going until they find a repeating pattern. That way we eliminate our problem of overestimating and will get the exact number of pattern sequences in any base for the 'square digits' pattern. Here they are again, after implementing these new programs:

[1, 4, 1, 4, 2, 7, 6, 5, 2, 5, 7, 10, 3, 10, 2, 9, 6, 6, 2, 13, 5, 15, 5, 9, 2, 12, 7, 9, 5]

Notice that there are a couple of differences from the len = 25 list above, for bases 26 and 30.

Another Generalization

One may ask, "What is the significance of squaring each digit in the pattern sequence? Do the results change if a power other than 2 is used?" Those are excellent questions, and we will now begin to address them. The basic question is this - For a given power p (let's restrict ourselves to positive integers greater than or equal to 2, shall we?) and a base n, will each positive integer follow one of a finite number of pattern sequences? and If so, how many pattern sequences will occur when the 'p-digits' pattern is applied? Note that we can no longer call our pattern the 'square digits' pattern.) Look at what we've gotten ourselves into now!

To answer the first question - will each positive integer follow one of a finite number of pattern sequences? - we need to look at some calculations similar to those that we've done before. If we are in base n, using a power p, what will happen to the kth digit number when the 'p-digits' pattern is applied? We will get

The question is, what is this value less than? Is it less than

like it was before, or did the substitution of p for 2 screw things up completely? This might be a good place to use the same simplification procedures we used earlier:

Like before, we need to prove that this inequality is true for large enough k when n and p are fixed. If we do that, we'll have proved that any positive integer in any base will follow one of a finite number of sequences for the 'p-digits' pattern for any value of p. Again, we work by induction - meaning that we'll assume the above inequality is true for a choice of n, p, and arbitrary k and we'll show that it must therefore be true for n, p, and k + 1. If we look at the inequality for k + 1 we get:

Just like before, we are essentially asking the question, is -

This is why finding the explicit sum for the sequence is handy, and why I said we need to prove the inequality for large enough k. Clearly, sometimes this inequality will be true and sometimes it won't, depending on the respective values for p and k. Remember, though, p is a fixed value. If we can find values of k large enough to make this true, then we are in business (not too hard once we decide on p). So, here are some generalized Maple programs to incorporate different p values into the pattern.

How does this change all of our results? Below are some results of the generalized programs run with varying values for p and n (this is very computationally intensive, even for Maple). They are arranged in a matrix - so, if you want to know the program's calculation of how many patterns there are for base n with power p, you will look at the entry in the n - 1 column and the p - 1 row. Thus, base 10 with p = 2 is found in row 1, column 9 (the entry is 2, as we already know). I've only computed from bases 2 to 10 with p ranging from 2 to 4 because the program is so computationally intensive. However, feel free to use the programs to perform more calculations on your own system if you'd like, or better yet, write some more efficient programs (I have not spent much time optimizing these programs; I'm sure there could be better versions). Maybe you'll be able to speed up the computations enough that we can extend this matrix and discover new and more interesting patterns!