python,sequence,primes,collatz,memorization , finding the lowest collatz sequence that gives more that 65 primes

Question:

Tag: python,sequence,primes,collatz,memorization

I have a task where I need to find the lowest Collatz sequence that contains more than 65 prime numbers in Python.

For example, the Collatz sequence for 19 is:

19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1

This sequence contains 7 prime numbers.

I also need to use memoization so it doesn't have to run a "year" to find it. I found code for memoization of Collatz sequences, but I can't figure out how to get it to work when I need only the prime numbers.

Here is the Collatz memoization code that I found:

``````lookup = {}
def countTerms(n):
if n not in lookup:
if n == 1:
lookup[n] = 1
elif not n % 2:
lookup[n] = countTerms(n / 2)[0] + 1
else:
lookup[n] = countTerms(n*3 + 1)[0] + 1

return lookup[n], n
``````

And here is my tester for prime:

``````def is_prime(a):
for i in xrange(2,a):
if a%i==0:
#print a, " is not a prime number"
return False
if a==1:
return False
else:
return True
``````

Your existing code is incorrectly indented. I assume this task is a homework task, so I won't post a complete working solution, but I'll give you some helpful snippets.

First, here's a slightly more efficient primality tester. Rather than testing if all numbers less than `a` are factors of `a`, it just tests up to the square root of `a`.

``````def is_prime(a):
for i in xrange(2, int(1 + a ** 0.5)):
if a % i == 0:
return False
return True
``````

Note that this function returns `True` for `a = 1`. That's ok, since you don't need to test 1: you can pre-load it into the `lookup` dict:

``````lookup = {1:0}
``````

Your `countTerms` function needs to be modified slightly so that it only adds one to the `lookup` count when the current `n` is prime. In Python, `False` has a numeric value of 0 and `True` has a numeric value of 1. That's very handy here:

``````def count_prime_terms(n):
''' Count the number of primes terms in a Collatz sequence '''
if n not in lookup:
if n % 2:
next_n = n * 3 + 1
else:
next_n = n // 2

lookup[n] = count_prime_terms(next_n) + is_prime(n)
return lookup[n]
``````

I've changed the function name to be more Pythonic.

FWIW, the first Collatz sequence containing 65 or more primes actually contains 67 primes. Its seed number is over 1.8 million, and the highest number that requires primality testing when checking all sequences up to that seed is 151629574372. At completion, the `lookup` dict contains 3920492 entries.

In response to James Mills's comments regarding recursion, I've written a non-recursive version, and to make it easy to see that the iterative and the recursive versions both produce the same results I'm posting a complete working program. I said above that I wasn't going to do that, but I figure that it should be ok to do so now, since spørreren has already written their program using the info I supplied in my original answer.

I fully agree that it's good to avoid recursion except in situations where it's appropriate to the problem domain (eg, tree traversal). Python discourages recursion - it cannot optimize tail-call recursion and it imposes a recursion depth limit (although that limit can be modified, if desired).

This Collatz sequence prime counting algorithm is naturally stated recursively, but it's not too hard to do iteratively - we just need a list to temporarily hold the sequence while the primality of all its members are being determined. True, this list takes up RAM, but it's (probably) much more efficient space-wise than the stack frame requirements that the recursive version needs.

The recursive version reaches a recursion depth of 343 when solving the problem in the OP. This is well within the default limit but it's still not good, and if you want to search for sequences containing much larger numbers of primes, you will hit that limit.

The iterative & recursive versions run at roughly the same speed (at least, they do so on my machine). To solve the problem stated in the OP they both take a little under 2 minutes. This is significantly faster than my original solution, mostly due to optimizations in primality testing.

The basic Collatz sequence generation step already needs to determine if a number is odd or even. Clearly, if we already know that a number is even then there's no need to test if it's a prime. :) And we can also eliminate tests for even factors in the `is_prime` function. We can handle the fact that 2 is prime by simply loading the result for 2 into the `lookup` cache.

On a related note, when searching for the first sequence containing a given number of primes we don't need to test any of the sequences that start at an even number. Even numbers (apart from 2) don't increase the prime count of a sequence, and since the first odd number in such a sequence will be lower than our current number its results will already be in the `lookup` cache, assuming we start our search from 3. And if we don't start searching from 3 we just need to make sure our starting seed is low enough so that we don't accidentally miss the first sequence containing the desired number of primes. Adopting this strategy not only reduces the time needed, it also reduces the number of entries in the lookup` cache.

``````#!/usr/bin/env python

''' Find the 1st Collatz sequence containing a given number of prime terms

From http://stackoverflow.com/q/29920691/4014959

Written by PM 2Ring 2015.04.29

[Seed == 1805311, prime count == 67]
'''

import sys

def is_prime(a):
''' Test if odd `a` >= 3 is prime '''
for i in xrange(3, int(1 + a ** 0.5), 2):
if not a % i:
return 0
return 1

#Track the highest number generated so far; use a list
# so we don't have to declare it as global...
hi = [2]

#Cache for sequence prime counts. The key is the sequence seed,
# the value is the number of primes in that sequence.
lookup = {1:0, 2:1}

def count_prime_terms_iterative(n):
''' Count the number of primes terms in a Collatz sequence
Iterative version '''
seq = []
while n not in lookup:
if n > hi[0]:
hi[0] = n

if n % 2:
seq.append((n, is_prime(n)))
n = n * 3 + 1
else:
seq.append((n, 0))
n = n // 2

count = lookup[n]
for n, isprime in reversed(seq):
count += isprime
lookup[n] = count

return count

def count_prime_terms_recursive(n):
''' Count the number of primes terms in a Collatz sequence
Recursive version '''
if n not in lookup:
if n > hi[0]:
hi[0] = n

if n % 2:
next_n = n * 3 + 1
isprime = is_prime(n)
else:
next_n = n // 2
isprime = 0
lookup[n] = count_prime_terms(next_n) + isprime

return lookup[n]

def find_seed(numprimes, start):
''' Find the seed of the 1st Collatz sequence containing
`numprimes` primes, starting from odd seed `start` '''
i = start
mcount = 0

print 'seed, prime count, highest term, dict size'
while mcount < numprimes:
count = count_prime_terms(i)
if count > mcount:
mcount = count
print i, count, hi[0], len(lookup)
i += 2

#count_prime_terms = count_prime_terms_recursive
count_prime_terms = count_prime_terms_iterative

def main():
if len(sys.argv) > 1:
numprimes = int(sys.argv[1])
else:
print 'Usage: %s numprimes [start]' % sys.argv[0]
exit()

start = int(sys.argv[2]) if len(sys.argv) > 2 else 3

#Round `start` up to the next odd number
if start % 2 == 0:
start += 1

find_seed(numprimes, start)

if __name__ == '__main__':
main()
``````

When run with

`\$ ./CollatzPrimes.py 65`

the output is

``````seed, prime count, highest term, dict size
3 3 16 8
7 6 52 18
19 7 160 35
27 25 9232 136
97 26 9232 230
171 28 9232 354
231 29 9232 459
487 30 39364 933
763 32 250504 1626
1071 36 250504 2197
4011 37 1276936 8009
6171 43 8153620 12297
10971 44 27114424 21969
17647 48 27114424 35232
47059 50 121012864 94058
99151 51 1570824736 198927
117511 52 2482111348 235686
202471 53 17202377752 405273
260847 55 17202377752 522704
481959 59 24648077896 966011
963919 61 56991483520 1929199
1564063 62 151629574372 3136009
1805311 67 151629574372 3619607
``````