The video gives a short impression of the new watershed mode. You could use this watershed mode for example to calculate a realistic distribution of vegetation in large scale landscapes.
The central use case for ray casting in these add-ons is to check whether a point lies inside a mesh. This is determined by casting a ray and checking if the number of intersections is even or odd (where we count 0 as even). An even number of crossings means the point lies outside the mesh (true for the red points in the illustration) while an odd number means that the point lies inside the mesh (the green point). And the fun part is that this is true for any direction of the ray!
CodeWe can easily convert this to Python using Blenders API: given an object and point to test, we take some direction for our ray (here right along the x-direction) and check if there is an intersection. If so, we we test again but start at the intersection point, taking care to move a tiny bit along the ray cast direction to avoid getting 'stuck' at the intersection point due to numerical imprecision. We count the number of intersections and return True if this number is odd.
import bpy from mathutils import Vector from mathutils.bvhtree import BVHTree def q_inside(ob, point_in_object_space): direction = Vector((1,0,0)) epsilon = direction * 1e-6 count = 0 result, point_in_object_space, normal, index = ob.ray_cast(point_in_object_space, direction) while result: count += 1 result, point_in_object_space, normal, index = ob.ray_cast(point_in_object_space + epsilon, direction) return (count % 2) == 1
Intersection with an objectWe can now use this function to determine if a point lies within a mesh object:
q_inside(ob, (0,0,0))Note that the point to check is in object space so if you want to check if a point in world space lies within the mesh you should convert it first using the inverted wordl matrix of the object.
Intersection with a precalculated BVH treeTo determine if a ray hits any of the possibly millions of polygons that make up a mesh, the ray_cast() method internally constructs something called a BVH tree. This is a rather costly operation but fortunately a BVH tree is cached so when we perform many intersection tests this overhead should pay off.
BVH trees however can also be constructed from a mesh object separately and offers exactly the same signature ray_cast() method. This means we can perform the same check to see if a point lies inside a mesh with following code:
sc = bpy.context.scene bvhtree = BVHTree.FromObject(ob, sc) q_inside(bvhtree, (0,0,0))
PerformanceWhen doing millions of tests on the same mesh a small amount of overhead averages out so the question is if it is meaningful to create a separate BVH tree. If we look at the time it takes to do a million tests for meshes of different sizes we see that the overhead apparently is not the same but for larger meshes this difference loses significance:
inside irregular mesh (subdivided Suzanne)
outside irregular mesh with mostly immediate initial hitThe initial data was for a point inside a mesh: such a point will always need at least two ray casts: the first will intersect some polygon, the second will veer off into infinity. If we test a point outside a mesh we sometimes will have only one ray cast and sometimes two. if we look at the latter scenario we see similar behavior:
outside irregular mesh with mostly no initial hitsA strange thing happens when the point to test is outside the mesh but the ray happens to point away from the mesh so that we will have zero intersections. Now for large meshes the overhead of a separate BVH tree appears larger!
Conclusioni cannot quite explain the behavior of the ray_cast() performance for larger meshes when the point is outside the mesh and the test ray points away, but in arbitrary situations and moderately sized meshes a perfomance gain of 40% for the BVH tree is worthwhile when doing millions of tests.
In practice you would probably also try to avoid seams to line up in adjacent rows but if this happens you can easily switch to another random seed already.
AvailabilityAs usual the latest version is available on GitHub.
Other floor board articlesThis blog already sports quite a number of articles on this add-on. The easiest way to see them all is to select the floor board tag (on the right, or click the link).
Calculating the number of connections in a tree, or why numpy is awesome but a good algorithm even more so
the issueimagine you are creating trees, either pure data structures or ones that try to mimic real trees, and that you are interested in the number of connected branch segments. How would you calculate these numbers for each node in the tree?
If we have a small tree like the one in the illustration, where we have placed the node index inside each sphere, we might store for each node the index of its parent, like in the list:
p = [ -1, 0, 1, 2, 2, 4 ]
Note that node number 0 has -1 as the index of its parent to indicate it has no parent because it is the root node. Also node 3 and 4 have the same parent node because they represent a fork in a branch.
naive solutionto calculate the number of connected nodes for each node in the tree we could simply iterate over all the nodes and traverse the list of parent nodes until we reach the root node, all the while adding 1 to the number of connections. This might look like this (the numbers are profiling information)
hits time t/hit 7385 4037 0.5 for p in parents: 683441 1627791 2.4 while p >= 0: 676057 410569 0.6 connections[p] += 1 676057 351262 0.5 p = parents[p]The result has the following values
c = [5, 4, 3, 0, 1, 0]
The tips have no connections while the root counts all nodes as connections minus itself so our 6-node tree had a root with 5 connections. This is also illustrated in the illustration (on the right).
If we time this simple algorithm for a moderately sized tree of just over 7000 nodes (which might look like the one in the image below to get a sense of the complexity) I find that it takes about 1.16 seconds on my machine.
numpy implementationNow we all know that numpy is good at working with large arrays of numbers (and conveniently already bundled with Blender) so we might try some simple changes. Almost identical code but with numpy arrays instead of python lists:
hits time t/hit 7385 4563 0.6 for p in self.parent[:self.nverts]: 683441 1695443 2.5 while p >= 0: 676057 2004637 3.0 self.connections[p] += 1 676057 456995 0.7 p = self.parent[p]But in fact this is slower! ( 2.68 seconds on my machine) If we look at the timings from the excellent kernprof line profiler (the numbers on the left of the code snippets) we see that numpy spends an awful lot of time on both the iteration over the parents and especially the indexing of the connection array. Apparently indexing a single element in a numpy array is slower than the equivalent operation on a python list. (There is a good explanation why indexing a single element of a numpy array takes so long on StackOverflow.)
reworking the algorithmIt must be possible to do this faster right? Yes it is, have a look at the following code:
hits time t/hit 1 2 2.0 p = self.parent[:self.nverts] 1 19 19.0 p = p[p>=0] 137 97 0.7 while len(p): 136 1684 12.4 c = np.bincount(p) 136 1013 7.4 self.connections[:len(c)] += c 136 3395 25.0 p = self.parent[p] 136 1547 11.4 p = p[p>=0]This produces the same result (trust me, i'll explain in a minute), yet uses only milliseconds (7 to be precise).
The trick is to use as much numpy code with looping and indexing built in as possible:
bincount() counts the number of occurrences of a number and stores it the corresponding bin. So if 3 nodes have node 0 as their parent, c will be 3. Next we add this list in one go to the cumulative number of connections and finally we get the parent indices also in a single statement and remove the -1 entries (those nodes that reached root in their tree traversal). We repeat this as long as there are nodes that have not yet reached root.
conclusionNumpy might often be faster than pure Python but it pays to check, using careful profiling. Using an appropriate algorithm that does away as much as possible with any python overhead might however give you a sizeable speed improvement although the algorithm might not be immediately obvious to find.
Space tree proThis bit of research is not just academical, I am currently investigating options to speed up tree generation as implemented in my Space Tree Pro Blender add-on (available on BlenderMarket)
Restrictedrandom mode to the uv randomization. It still randomizes the position of planks on the uv-map but keeps them inside the original bounds. It does this simply by not moving a uv map of a plank at all when the move would result in any portion of the plank ending up outside the bounds. The result is a compacter randomized uv-map which should make it somewhat easier to use with non-tiling textures, i.e. reduce the area of the textures that is not used.
The new version is of course available on GitHub.
With the results of the measurements in the back of my mind and spurred by a discussion on BlenderArtists I created a new version of the random vertex colors addon. (available on GitHub), which reduced the time to generate random vertex colors for approximately 1 million faces from 3.2 seconds to 0.8 seconds on my machine (or even 0.4s for a 32 bit variant), i.e. up to 8 times faster which is not bad.
Some explanationAssiging random vertex colors to faces means that you have to assign colors to all the loops of a polygon the same random color (for the difference between vertices and loops, check the BMesh design document. The same applies to the regular Mesh objects we use in the code below). This can be done in a straight forward manner once you have located the vertex color layer:
mesh = context.scene.objects.active.data vertex_colors = mesh.vertex_colors.active.data polygons = mesh.polygons for poly in polygons: color = [random(), random(), random()] for loop_index in range(poly.loop_start, poly.loop_start + poly.loop_total): vertex_colors[loop_index].color = colorStraight forward as this may be, a loop inside a loop is time consuming and so is generating lots of random numbers, especially because Python does not do this in parallel even if you have a processor with multiple cores.
Fortunately for us Blender comes bundled with Numpy, which is a library that can manipulate huge arrays of numbers in a very efficient manner. This allows for a much more efficient approach (although as shown previously a significant speed increase is only noticeable for large meshes).
startloop = np.empty(npolygons, dtype=np.int) numloops = np.empty(npolygons, dtype=np.int) polygon_indices = np.empty(npolygons, dtype=np.int) polygons.foreach_get('index', polygon_indices) polygons.foreach_get('loop_start', startloop) polygons.foreach_get('loop_total', numloops) colors = np.random.random_sample((npolygons,3))We can even reduce storage (and get an additional speedup) if we change the types to 32 bit variants. There will be no loss of accuracy as these are the sizes used by Blender internally. (Would you do a lot of additional calculations this might be different of course). The change would only alter the declarations of the arrays:
startloop = np.empty(npolygons, dtype=np.int32) numloops = np.empty(npolygons, dtype=np.int32) polygon_indices = np.empty(npolygons, dtype=np.int32) polygons.foreach_get('index', polygon_indices) polygons.foreach_get('loop_start', startloop) polygons.foreach_get('loop_total', numloops) colors = np.random.random_sample((npolygons,3)).astype(np.float32)As shown above we start out by creating Numpy array that will hold the startloop indices and the number of of loops in each polygon as well as an array that will hold the polygon indices itself. This last one isn't strictly needed for assigning random values because we don't care which random color we assign to which polygon but for other scenarios it might make more sense so we keep it here. We get all these indices from the Mesh object using the fast
foreach_getmethod. We then use Numpy buil-in
random_samplefunction the generate the random colors (3 random floats between 0 and 1) for all polygons.
loopcolors = np.empty((nloops,3)) # or loopcolors = np.empty((nloops,3), dtype=np.float32) loopcolors[startloop] = colors[polygon_indices] numloops -= 1 nz = np.flatnonzero(numloops) while len(nz): startloop[nz] += 1 loopcolors[startloop[nz]] = colors[polygon_indices[nz]] numloops[nz] -= 1 nz = np.flatnonzero(numloops)The real work is done in the code above: we first create an empty array to hold the colors for all individual loops. Then we assign all the loops with index startloop with the colors that corresponds to the color of the polygon it belongs to. Note that
polygon_indicesare arrays with the same length, i.e. the number of polygons.
Now we also have an array
numloops which holds for each polygon the number of loops. We decrement this number of loops by one for all polygons in one go (line 3) and create an array of indices of those elements in
numloops that are still greater than zero (line 4). If we are still left with one or more indices (line 5) we increment the index of the startloop for all those nonzero indices (line 6).
Line 7 then assigns again in one go a polygon color to a loop at a certain index for all polygons where there are still a non zero number of loops. And finally we again reduce our
numloop counters. Note that this all works because the loop indices of all loops of a polygon are consecutive.
loopcolors = loopcolors.flatten() vertex_colors.foreach_set("color", loopcolors)The final lines flatten the array of loop colors to a 1-D array and write back the colors to the Mesh object with the fast
DiscussionNow even though we end up with about four times as much code, it is much faster while still quite readable, as long as you get your head around using index arrays. Basically our single while loops executes the inner loop of assigning colors to loops for all polygons in parallel. The penalty is for this speed increase is memory consumption: We use 5 large numpy arrays here.
To use Numpy you would need to copy for example vertex coordinates first before performing any calculations and then copy the results back again. This consumes extra memory and time. You wouldn't do this for a single operation like determining the average position of all vertices because Python can perform this so efficiently that it wouldn't compensate the extra setup time. However, when calculating many iterations of some erosion algorithm this might be well worth the cost.
In order to help in deciding whether this setup cost is worth the effort I decided to produce some benchmark values. My aim was to find the fastest method from several alternatives and to produce actual timings. Now your timings will be different of course but the trends will be comparable and I have provided the source code for my benchmark programs on GitHub.
results, copying to a Numpy arrayFirst we have a look at how much time it takes to copy the vertex coordinates from a mesh to a Numpy array. Given a the array of vertices
me.verticesand a Numpy array
vertswe have several options:
# blue line verts = np.array([v.co for v in me.vertices]) # red line verts = np.fromiter((x for v in me.vertices for x in v.co), dtype=np.float64) verts.shape = (len(me.vertices), 3) # orange line verts = np.fromiter((x for v in me.vertices for x in v.co), dtype=np.float64, count=len(me.vertices)*3) verts.shape = (len(me.vertices), 3) # green line verts = np.empty(count*3, dtype=np.float64) me.vertices.foreach_get('co', verts) verts.shape = shape
The red line shows a huge increase in performance as we use a generator expression to access the coordinates. This will not create an intermediate list but the result is a one dimensional array. Changing the shape of an array in Numpy involves no copying so is very cheap.
We can still improve a little bit on our timing by preallocating the size for the Numpy array. The result is shown with the orange line.
The green line shows the fastest option. Blender provides a fast method to access attributes in a property collection and this method wins hands down. (thanks Linus Yng for pointing this out)
results, copying from a Numpy arrayCopying data back from a Numpy array can also be done in more than one way, the most interesting ones are shown below:
# blue line for i,v in enumerate(me.vertices): v.co = verts[i] # green line for n,v in zip(verts, me.vertices): v.co = n # light blue line def f2(n,v): v.co = n for v in itertools.starmap(f2, zip(verts, me.vertices)): pass # orange line verts.shape = count*3 me.vertices.foreach_set('co', verts) verts.shape = shapeThe conventional methods perform not significantly different, but again Blenders built-in
foreach_setmethod outperforms all other other options by a mile, just like its
DiscussionThe actual timings on my machine (a intel core i7 running at 4GHz) indicate that we can copy about 1.87 Million vertex coordinates per second from a Blender mesh to a Numpy array and about 1.12 Million vertex coordinates per second from a Numpy array to a Blender mesh using 'conventional methods'. The 50% difference might be due to the fact that the coordinate data in a Blender mesh is scattered over the internal memory (as compared to Numpy's contiguous block of ram) and writing to scattered memory incurs a bigger hit form cache misses and the like than reading from it. Note that a vertex location consists of three 64 bit floating point numbers.
foreach_set this performance is greatly improved to 13.8 Million vertex coordinates per second and 10.8 Million vertex coordinates per second respectively (on my machine).
Overall, with this performance this means that on my machine even a simple scaling operation on a million vertices might already be faster with Numpy, like shown in the first example below, which on my machine is about 7x faster (not that scaling is such a perfect example, chances are Blender's scale operator is faster but it gives you an idea how much there is to gain if even for simple operations that are not natively available in Blender.
# numpy code with fast Blender foreach fac = np.array([1.1, 1.1, 1.1]) count = len(me.vertices) shape = (count, 3) fac = np.array([1.1, 1.1, 1.1]) verts = np.empty(count*3, dtype=np.float64) me.vertices.foreach_get('co', verts) verts.shape = shape np.multiply(verts,fac,out=verts) verts.shape = count*3 me.vertices.foreach_set('co', verts) verts.shape = shape # 'classic code' fac = Vector((1.1, 1.1, 1.1)) for v in me.vertices: v.co *= fac v.co *= fac v.co *= fac