Skip to content

Instantly share code, notes, and snippets.

@nitaku
Last active December 31, 2015 08:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nitaku/7962295 to your computer and use it in GitHub Desktop.
Save nitaku/7962295 to your computer and use it in GitHub Desktop.
Random tree generator (prefixes)

See this Gist for the full source.

This python script generates a random tree (as in this example), and saves it in a CSV-compliant file (without header). Each line of the file represents a leaf node using a tuple serialization of its prefix, that indicate the path from the root node to the leaf.

Usage (ipython):

run generate_random_tree.py
generate('tree.csv', 6, 8) # filename, maximum depth, maximum branching factor

An example tree containing 85 leaves is included.

The serialization format is not common for trees, but useful in our study about fractal treemaps (also known as jigsaw treemaps). A better, more generic format for saving trees is, for example, the Newick format.

from __future__ import print_function
from random import random, randint
leaf_count = 0
def rand_tree(prefixes, d, MAX_D, MAX_N, index, prefix):
global leaf_count
# return a tree with maximum depth MAX_D that branches with probability p at most N times for each internal node
# p starts from 1 and decreases linearly with d, reaching zero at MAX_D
# this still seems to be necessary to avoid infinite recursion (floating point precision?)
if d == MAX_D:
prefixes.append(prefix[1:])
leaf_count += 1
print(leaf_count, end='\r')
p = float(MAX_D-d)/MAX_D
# if the tree branches, at least one branch is made
n = randint(1, MAX_N)
child_i = 0
for i in xrange(n):
if p >= random():
child_i += 1
rand_tree(prefixes, d+1, MAX_D, MAX_N, child_i, prefix+(index,))
else:
prefixes.append(prefix[1:])
leaf_count += 1
print(leaf_count, end='\r')
def generate(filename, max_d, max_n):
global leaf_count
leaf_count = 0
prefixes = []
rand_tree(prefixes, 1, max_d, max_n, 0, ())
prefixes.sort()
# write prefix tuples to the output file
with open(filename, 'w') as output:
for prefix in prefixes:
print(prefix, file=output)
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 1 column, instead of 2. in line 7.
()
()
()
()
()
()
()
(1, 1)
(1, 1)
(1, 1)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(1, 1, 2)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1, 2)
(2, 1, 2)
(2, 1, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(2, 3)
(2, 3)
(2, 3)
(2, 3)
(2, 3)
(2, 3)
(2, 3)
(2, 3)
(2, 3)
(2, 4)
(2, 4)
(3,)
(3, 1)
(3, 1)
(3, 2)
(3, 2)
(3, 2)
(3, 2)
(3, 2)
(3, 2)
(3, 2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment