View on GitHub

Embree

High Performance Ray Tracing Kernels

We recently released Embree v2.16.4!

Embree API

The Embree API is a low level ray tracing API that supports defining and committing of geometry and performing ray queries of different types. Static and dynamic scenes are supported, that may contain triangle geometries, quad geometries, line segment geometries, hair geometries, analytic bezier curves, subdivision meshes, instanced geometries, and user defined geometries. For each geometry type multi-segment motion blur is supported, including support for transformation motion blur of instances. Supported ray queries are, finding the closest scene intersection along a ray, and testing a ray segment for any intersection with the scene. Single rays, as well as packets of rays in a struct of array layout can be used for packet sizes of 1, 4, 8, and 16 rays. Using the ray stream interface a stream of an arbitrary number M of ray packets of arbitrary size N can be processed. Filter callback functions are supported, that get invoked for every intersection encountered during traversal.

The Embree API exists in a C++ and ISPC version. This document describes the C++ version of the API, the ISPC version is almost identical. The only differences are that the ISPC version needs some ISPC specific uniform type modifiers, and has special functions that operate on ray packets of the native SIMD size the ISPC code is compiled for.

Embree supports two modes for a scene, the normal mode and stream mode, which require different ray queries and callbacks to be used. The normal mode is the default, but we will switch entirely to the ray stream mode in a later release.

The user is supposed to include the embree2/rtcore.h, and the embree2/rtcore_ray.h file, but none of the other header files. If using the ISPC version of the API, the user should include embree2/rtcore.isph and embree2/rtcore_ray.isph.

#include <embree2/rtcore.h>
#include <embree2/rtcore_ray.h>

All API calls carry the prefix rtc which stands for ray tracing core. Embree supports a device concept, which allows different components of the application to use the API without interfering with each other. You have to create at least one Embree device through the rtcNewDevice call. Before the application exits it should delete all devices by invoking rtcDeleteDevice. An application typically creates a single device only, and should create only a small number of devices.

RTCDevice device = rtcNewDevice(NULL);
...
rtcDeleteDevice(device);

It is strongly recommended to have the Flush to Zero and Denormals are Zero mode of the MXCSR control and status register enabled for each thread before calling the rtcIntersect and rtcOccluded functions. Otherwise, under some circumstances special handling of denormalized floating point numbers can significantly reduce application and Embree performance. When using Embree together with the Intel® Threading Building Blocks, it is sufficient to execute the following code at the beginning of the application main thread (before the creation of the tbb::task_scheduler_init object):

#include <xmmintrin.h>
#include <pmmintrin.h>
...
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

Embree processes some implementation specific configuration from the following locations in the specified order:

  1. configuration string passed to the rtcNewDevice function
  2. .embree2 file in the application folder
  3. .embree2 file in the home folder

Settings performed later overwrite previous settings. This way the configuration for the application can be changed globally (either through the rtcNewDevice call or through the .embree2 file in the application folder) and each user has the option to modify the configuration to fit its needs. Configuration files can be ignored by the application by passing ignore_config_files=1 to rtcNewDevice.

API calls that access geometries are only thread safe as long as different geometries are accessed. Accesses to one geometry have to get sequenced by the application. All other API calls are thread safe. The API calls are re-entrant, it is thus safe to trace new rays and create new geometry when intersecting a user defined object.

Each user thread has its own error flag per device. If an error occurs when invoking some API function, this flag is set to an error code if it stores no previous error. The rtcDeviceGetError function reads and returns the currently stored error and clears the error flag again.

Possible error codes returned by rtcDeviceGetError are:

Return values of rtcDeviceGetError.
Error Code Description
RTC_NO_ERROR No error occurred.
RTC_UNKNOWN_ERROR An unknown error has occurred.
RTC_INVALID_ARGUMENT An invalid argument was specified.
RTC_INVALID_OPERATION The operation is not allowed for the specified object.
RTC_OUT_OF_MEMORY There is not enough memory left to complete the operation.
RTC_UNSUPPORTED_CPU The CPU is not supported as it does not support SSE2.
RTC_CANCELLED The operation got cancelled by an Memory Monitor Callback or Progress Monitor Callback function.

When the device construction fails rtcNewDevice returns NULL as device. To detect the error code of a such a failed device construction pass NULL as device to the rtcDeviceGetError function. For all other invokations of rtcDeviceGetError a proper device pointer has to get specified.

Using the rtcDeviceSetErrorFunction2 call, it is also possible to set a callback function that is called whenever an error occurs for a device.

typedef void (*RTCErrorFunc2)(void* userPtr, const RTCError code, const char* str);
void rtcDeviceSetErrorFunction2(RTCDevice device, RTCErrorFunc2 func, void* userPtr);

When invoked, the registred callback function gets passed a user defined pointer userPtr, the error code code, as well as some string str that describes the error further. Passing NULL as function pointer to rtcDeviceSetErrorFunction2 disables the set callback function again. The previously described error flags are also set if an error callback function is present.

Scene

A scene is a container for a set of geometries of potentially different types. A scene is created using the rtcDeviceNewScene function call, and destroyed using the rtcDeleteScene function call. Two types of scenes are supported, dynamic and static scenes. Different flags specify the type of scene to create and the type of ray query operations that can later be performed on the scene. The following example creates a scene that supports dynamic updates and the single ray rtcIntersect and rtcOccluded calls.

RTCScene scene = rtcDeviceNewScene(device, RTC_SCENE_DYNAMIC, RTC_INTERSECT1);
...
rtcDeleteScene(scene);

Using the following scene flags the user can select between creating a static or dynamic scene.

Dynamic type flags for rtcDeviceNewScene.
Scene Flag Description
RTC_SCENE_STATIC Scene is optimized for static geometry.
RTC_SCENE_DYNAMIC Scene is optimized for dynamic geometry.

A dynamic scene is created by invoking rtcDeviceNewScene with the RTC_SCENE_DYNAMIC flag. Different geometries can now be created inside that scene. Geometries are enabled by default. Once the scene geometry is specified, an rtcCommit call will finish the scene description and trigger building of internal data structures. After the rtcCommit call it is safe to perform ray queries of the type specified at scene construction time. Geometries can get disabled (rtcDisable call), enabled again (rtcEnable call), and deleted (rtcDeleteGeometry call). Geometries can also get modified, including their vertex and index arrays. After the modification of some geometry, rtcUpdate or rtcUpdateBuffer has to get called for that geometry to specify which buffers got modified. Each modified buffer can be specified separately using the rtcUpdateBuffer function. In contrast the rtcUpdate function simply tags each buffer of some geometry as modified. If geometries got enabled, disabled, deleted, or modified an rtcCommit call has to get invoked before performing any ray queries for the scene, otherwise the effect of the ray query is undefined. During an rtcCommit call modifications to the scene are not allowed.

A static scene is created by the rtcDeviceNewScene call with the RTC_SCENE_STATIC flag. Geometries can only get created, enabled, disabled and modified until the first rtcCommit call. After the rtcCommit call, each access to any geometry of that static scene is invalid. Geometries that got created inside a static scene can only get deleted by deleting the entire scene.

The modification of geometry, building of hierarchies using rtcCommit, and tracing of rays have always to happen separately, never at the same time.

Embree silently ignores primitives that would cause numerical issues, e.g. primitives containing NaNs, INFs, or values greater than 1.844E18f.

The following flags can be used to tune the used acceleration structure. These flags are only hints and may be ignored by the implementation.

Acceleration structure flags for rtcDeviceNewScene.
Scene Flag Description
RTC_SCENE_COMPACT Creates a compact data structure and avoids algorithms that consume much memory.
RTC_SCENE_COHERENT Optimize for coherent rays (e.g. primary rays).
RTC_SCENE_INCOHERENT Optimize for in-coherent rays (e.g. diffuse reflection rays).
RTC_SCENE_HIGH_QUALITY Build higher quality spatial data structures.

The following flags can be used to tune the traversal algorithm that is used by Embree. These flags are only hints and may be ignored by the implementation.

Traversal algorithm flags for rtcDeviceNewScene.
Scene Flag Description
RTC_SCENE_ROBUST Avoid optimizations that reduce arithmetic accuracy.

The second argument of the rtcDeviceNewScene function are algorithm flags, that allow to specify which ray queries are required by the application. Calling a ray query API function for a scene that is different to the ones specified at scene creation time is not allowed. Further, the application should only pass ray query requirements that are really needed, to give Embree most freedom in choosing the best algorithm. E.g. in case Embree implements no packet traversers for some highly optimized data structure for single rays, then this data structure cannot be used if the user enables any ray packet query.

Enabled algorithm flags for rtcDeviceNewScene.
Algorithm Flag Description
RTC_INTERSECT1 Enables the rtcIntersect and rtcOccluded functions (single ray interface) for this scene.
RTC_INTERSECT4 Enables the rtcIntersect4 and rtcOccluded4 functions (4-wide packet interface) for this scene.
RTC_INTERSECT8 Enables the rtcIntersect8 and rtcOccluded8 functions (8-wide packet interface) for this scene.
RTC_INTERSECT16 Enables the rtcIntersect16 and rtcOccluded16 functions (16-wide packet interface) for this scene.
RTC_INTERSECT_STREAM Enables the rtcIntersect1M, rtcOccluded1M, rtcIntersect1Mp, rtcOccluded1Mp, rtcIntersectNM, rtcOccludedNM, rtcIntersectNp, and rtcOccludedNp functions for this scene.
RTC_INTERPOLATE Enables the rtcInterpolate and rtcInterpolateN interpolation functions.

Embree supports two modes for a scene, the normal mode and stream mode. These modes mainly differ in the kind of callbacks invoked and how rays are extended with user data. The normal mode is enabled by default, the ray stream mode can be enabled using the RTC_INTERSECT_STREAM algorithm flag for a scene. Only in ray stream mode, the stream API functions rtcIntersect1M, rtcIntersect1Mp, rtcIntersectNM, and rtcIntersectNp as well as their occlusion variants can be used.

The scene bounding box can get read by the function rtcGetBounds(RTCScene scene, RTCBounds& bounds_o). This function will write the AABB of the scene to bounds_o. Time varying bounds can be obtained usin the rtcGetLinearBounds(RTCScene scene, RTCBounds* bounds_o) function. This function will write two AABBs to bounds_o. Linearly interpolating these bounds to a specific time t yields bounds that bound the geometry at that time. Invoking these functions is only valid when all scene changes got committed using rtcCommit.

Geometries

Geometries are always contained in the scene they are created in. Each geometry is assigned an integer ID at creation time, which is unique for that scene. The current version of the API supports triangle meshes (rtcNewTriangleMesh2), quad meshes (rtcNewQuadMesh2), Catmull-Clark subdivision surfaces (rtcNewSubdivisionMesh2), curve geometries (rtcNewBezierCurveGeometry2), hair geometries (rtcNewBezierHairGeometry2), single level instances of other scenes (rtcNewInstance3), and user defined geometries (rtcNewUserGeometry3). The API is designed in a way that easily allows adding new geometry types in later releases.

The application can manage geometry IDs itself, or let Embree allocate geometry IDs. Therefore all geometry creation functions have a geomID parameter. This parameter can be set to RTC_INVALID_GEOMETRY_ID to let Embree allocate a geometry ID (default) or to some geometry ID allocated by the application.

If the application allocates a geometry ID, then this geometry ID has to be unused in the scene, otherwise the creation of the geometry will fail. Further, the geometry IDs allocated by the application should be compact, as Embree internally created a vector which size is equal to the largest geometry ID used. Creating very large geometry IDs for small scenes would thus cause a memory consumption and performance overhead.

If Embree allocates a geometry ID then the following properties hold. For dynamic scenes, all IDs are assigned sequentially, starting from 0, as long as no geometry got deleted. If geometries got deleted, the implementation will reuse IDs later on in an implementation dependent way. Consequently sequential assignment is no longer guaranteed, but a compact range of IDs. These rules allow the application to manage a dynamic array to efficiently map from geometry IDs to its own geometry representation. For static scenes, geometry IDs are assigned sequentially starting at 0. This allows the application to use a fixed size array to map from geometry IDs to its own geometry representation.

Alternatively the application can also use the void rtcSetUserData (RTCScene scene, unsigned geomID, void* ptr) function to set a user data pointer ptr to its own geometry representation, and later read out this pointer again using the void* rtcGetUserData (RTCScene scene, unsigned geomID) function.

The following geometry flags can be specified at construction time of geometries:

Flags for the creation of new geometries.
Geometry Flag Description
RTC_GEOMETRY_STATIC The geometry is considered static and should get modified rarely by the application. This flag has to get used in static scenes.
RTC_GEOMETRY_DEFORMABLE The geometry is considered to deform in a coherent way, e.g. a skinned character. The connectivity of the geometry has to stay constant, thus modifying the index array is not allowed. The implementation is free to choose a BVH refitting approach for handling meshes tagged with that flag.
RTC_GEOMETRY_DYNAMIC The geometry is considered highly dynamic and changes frequently, possibly in an unstructured way. Embree will rebuild data structures from scratch for this type of geometry.

Triangle Meshes

Triangle meshes are created using the rtcNewTriangleMesh2 function call, and potentially deleted using the rtcDeleteGeometry function call.

The number of triangles, number of vertices, and optionally the number of time steps for multi-segment motion blur have to get specified at construction time of the mesh. The user can also specify additional flags that choose the strategy to handle that mesh in dynamic scenes. The following example demonstrates how to create a triangle mesh without motion blur:

unsigned geomID = rtcNewTriangleMesh2(scene, geomFlags,
                                      numTriangles, numVertices, 1);

The triangle indices can be set by mapping and writing to the index buffer (RTC_INDEX_BUFFER) and the triangle vertices can be set by mapping and writing into the vertex buffer (RTC_VERTEX_BUFFER). The index buffer contains an array of three 32 bit indices, while the vertex buffer contains an array of three float values. The vertex buffer can be at most 16GB large. When the vertex buffer is managed internally the stride between vertices is 16 bytes. For multi segment motion blur, for each time step a vertex buffer has to be specified, and all these buffers have to have the same stride. All buffers have to get unmapped before an rtcCommit call to the scene.

struct Vertex   { float x, y, z, a; };
struct Triangle { int v0, v1, v2; };

Vertex* vertices = (Vertex*) rtcMapBuffer(scene, geomID, RTC_VERTEX_BUFFER);
// fill vertices here
rtcUnmapBuffer(scene, geomID, RTC_VERTEX_BUFFER);

Triangle* triangles = (Triangle*) rtcMapBuffer(scene, geomID, RTC_INDEX_BUFFER);
// fill triangle indices here
rtcUnmapBuffer(scene, geomID, RTC_INDEX_BUFFER);

Also see tutorial Triangle Geometry for an example of how to create triangle meshes.

The parametrization of a triangle uses the first vertex p0 as base point, and the vector p1 - p0 as u-direction and p2 - p0 as v-direction. The following picture additionally illustrates the direction the geometry normal is pointing into.

Some texture coordinates t0,t1,t2 can be linearly interpolated over the triangle the following way:

t_uv = (1-u-v)*t0 + u*t1 + v*t2

Quad Meshes

Quad meshes are created using the rtcNewQuadMesh2 function call, and potentially deleted using the rtcDeleteGeometry function call.

The number of quads, number of vertices, and optionally the number of time steps for multi-segment motion blur have to get specified at construction time of the mesh. The user can also specify additional flags that choose the strategy to handle that mesh in dynamic scenes. The following example demonstrates how to create a quad mesh without motion blur:

unsigned geomID = rtcNewQuadMesh2(scene, geomFlags,
                                  numQuads, numVertices, 1);

The quad indices can be set by mapping and writing to the index buffer (RTC_INDEX_BUFFER) and the quad vertices can be set by mapping and writing into the vertex buffer (RTC_VERTEX_BUFFER). The index buffer contains an array of four 32 bit indices, while the vertex buffer contains an array of three float values. The vertex buffer can be at most 16GB large. When the vertex buffer is managed internally the stride between vertices is 16 bytes. For multi segment motion blur, for each time step a vertex buffer has to be specified, and all these buffers have to have the same stride. All buffers have to get unmapped before an rtcCommit call to the scene.

struct Vertex { float x, y, z, a; };
struct Quad   { int v0, v1, v2, v3; };

Vertex* vertices = (Vertex*) rtcMapBuffer(scene, geomID, RTC_VERTEX_BUFFER);
// fill vertices here
rtcUnmapBuffer(scene, geomID, RTC_VERTEX_BUFFER);

Quad* quads = (Quad*) rtcMapBuffer(scene, geomID, RTC_INDEX_BUFFER);
// fill quad indices here
rtcUnmapBuffer(scene, geomID, RTC_INDEX_BUFFER);

A quad is internally handled as a pair of two triangles v0,v1,v3 and v2,v3,v1, with the u’/v’ coordinates of the second triangle corrected by u = 1-u' and v = 1-v' to produce a quad parametrization where u and v go from 0 to 1.

To encode a triangle as a quad just replicate the last triangle vertex (v0,v1,v2 -> v0,v1,v2,v2). This way the quad mesh can be used to represent a mixed mesh which contains triangles and quads.

Subdivision Surfaces

Catmull-Clark subdivision surfaces for meshes consisting of faces of up to 15 vertices (e.g. triangles, quadrilateral, pentagons, etc.) are supported, including support for edge creases, vertex creases, holes, non-manifold geometry, and face-varying interpolation.

A subdivision surface is created using the rtcNewSubdivisionMesh2 function call, and deleted again using the rtcDeleteGeometry function call.

 unsigned rtcNewSubdivisionMesh2(RTCScene scene,
                                 RTCGeometryFlags flags,
                                 size_t numFaces,
                                 size_t numEdges,
                                 size_t numVertices,
                                 size_t numEdgeCreases,
                                 size_t numVertexCreases,
                                 size_t numCorners,
                                 size_t numHoles,
                                 size_t numTimeSteps,
                                 unsigned int geomID);

The number of faces (numFaces), edges/indices (numEdges), vertices (numVertices), edge creases (numEdgeCreases), vertex creases (numVertexCreases), holes (numHoles), and time steps (numTimeSteps) have to get specified at construction time.

The following buffers have to get setup by the application: the face buffer (RTC_FACE_BUFFER) contains the number edges/indices (3 to 15) of each of the numFaces faces, the index buffer (RTC_INDEX_BUFFER) contains multiple (3 to 15) 32 bit vertex indices for each face and numEdges indices in total, the vertex buffer (RTC_VERTEX_BUFFER) stores numVertices vertices as single precision x, y, z floating point coordinates aligned to 16 bytes. The value of the 4th float used for alignment can be arbitrary.

Optionally the application may fill additional index buffers if multiple topologies are required for face-varying interpolation. The standard vertex buffers RTC_VERTEX_BUFFER are always bound to the geometry topology (topology 0) thus use RTC_INDEX_BUFFER0. Data interpolation may use different topologies as described later.

Optionally, the application can setup the hole buffer (RTC_HOLE_BUFFER) with numHoles many 32 bit indices of faces that should be considered non-existing in all topologies.

Optionally, the application can fill the level buffer (RTC_LEVEL_BUFFER) with a tessellation rate for each or the edges of each face, making a total of numEdges values. The tessellation level is a positive floating point value, that specifies how many quads along the edge should get generated during tessellation. If no level buffer is specified a level of 1 is used. The maximally supported edge level is 4096 and larger levels get clamped to that value. Note that some edge may be shared between (typically 2) faces. To guarantee a watertight tessellation, the level of these shared edges has to be exactly identical. A uniform tessellation rate for an entire subdivision mesh can be set by using the rtcSetTessellationRate(RTCScene scene, unsigned geomID, float rate) function. The existance of a level buffer has preference over the uniform tessellation rate.

Optionally, the application can fill the sparse edge crease buffers to make some edges appear sharper. The edge crease index buffer (RTC_EDGE_CREASE_INDEX_BUFFER) contains numEdgeCreases many pairs of 32 bit vertex indices that specify unoriented edges in the geometry topology. The edge crease weight buffer (RTC_EDGE_CREASE_WEIGHT_BUFFER) stores for each of theses crease edges a positive floating point weight. The larger this weight, the sharper the edge. Specifying a weight of infinity is supported and marks an edge as infinitely sharp. Storing an edge multiple times with the same crease weight is allowed, but has lower performance. Storing an edge multiple times with different crease weights results in undefined behavior. For a stored edge (i,j), the reverse direction edges (j,i) does not have to get stored, as both are considered the same edge. Edge crease features are specified for the geomtetry topology, but copied to all other topologies automatically.

Optionally, the application can fill the sparse vertex crease buffers to make some vertices appear sharper. The vertex crease index buffer (RTC_VERTEX_CREASE_INDEX_BUFFER), contains numVertexCreases many 32 bit vertex indices to specify a set of vertices from the geometry topology. The vertex crease weight buffer (RTC_VERTEX_CREASE_WEIGHT_BUFFER) specifies for each of these vertices a positive floating point weight. The larger this weight, the sharper the vertex. Specifying a weight of infinity is supported and makes the vertex infinitely sharp. Storing a vertex multiple times with the same crease weight is allowed, but has lower performance. Storing a vertex multiple times with different crease weights results in undefined behavior. Vertex crease features are specified for the geomtetry topology, but copied to all other topologies automatically.

Faces with 3 to 15 vertices are supported (triangles, quadrilateral, pentagons, etc).

The user can also specify a geometry mask and additional flags that choose the strategy to handle that subdivision mesh in dynamic scenes.

The implementation of subdivision surfaces uses an internal software cache, which can get configured to some desired size (see Configuring Embree).

Parametrization

The parametrization of a regular quadrilateral uses the first vertex p0 as base point, and the vector p1 - p0 as u-direction and p3 - p0 as v-direction. The following picture additionally illustrates the direction the geometry normal is pointing into.

Some texture coordinates t0,t1,t2,t3 can be bi-linearly interpolated over the quadrilateral the following way:

t_uv = (1-v)((1-u)*t0 + u*t1) + v*((1-u)*t3 + u*t2)

The parametrization for all other face types where the number of vertices is not equal to 4, have a special parametrization where the n’th quadrilateral (that would be obtained by a single subdivision step) and the local hit location inside this quadrilateral are encoded in the UV coordinates. The following piece of code extracts the sub-patch ID i and UVs of this subpatch:

const unsigned l = floorf(0.5f*U);
const unsigned h = floorf(0.5f*V);
const unsigned i = 4*h+l;
const float u = 2.0f*fracf(0.5f*U)-0.5f;
const float v = 2.0f*fracf(0.5f*V)-0.5f;

This encoding allows local subpatch UVs to be in the range [-0.5,1.5[ thus negative subpatch UVs can get passed to rtcInterpolate to sample subpatches slightly out of bounds. This can be useful to calculate derivatives using finite differences if required. The encoding further has the property that you can just move some value du (or dv) on a subpatch by adding du (or dv) to the special UV encoding as long as you are not falling out of the [-0.5,1.5[ range. Further, derivatives calculated using finite differences are compatible with derivatives calculated using rtcInterpolate when using the standard formula:

dF(u)/du = (F(u+du)-F(u))/du

To smoothly interpolate texture coordinates over the subdivision surface we recommend using the rtcInterpolate function, which will apply the standard subdivision rules for interpolation and automatically take care of the special UV encoding for non-quadrilaterals.

Face-Varing Data

Face-varying interpolation is supported through multiple topologies per subdivision mesh and binding such topologies to user vertex buffers to interpolate. This way texture coordinates may use a different topology with additional boundaries to construct separate UV regions inside one subdivision mesh.

Each such topology consists of an index buffer and subdivision mode. Up to 16 topologies are supported, with corresponding index buffers RTC_INDEX_BUFFER0+i, with i in the range 0 to 15.

Each of the 16 supported user vertex buffers RTC_USER_VERTEX_BUFFER0+j (j in the range 0 to 15) can be assigned to some topology using the rtcSetIndexBuffer call:

void rtcSetIndexBuffer(RTCScene scene, unsigned geomID,
                       RTCBufferType vertexBuffer, RTCBufferType indexBuffer);

The face buffer (RTC_FACE_BUFFER) is shared between all topologies, which means that the n’th primitive always has the same number of vertices (e.g. being a triangle or a quad) for each topology. However, the indices of the topologies themselves may be different.

Subdivision Mode

The subdivision modes can be used to force linear interpolation for some parts of the subdivision mesh.

Subdivision modes supported by Embree.
Boundary Mode Description
RTC_SUBDIV_NO_BOUNDARY Boundary patches are ignored. This way each rendered patch has a full set of control vertices.
RTC_SUBDIV_SMOOTH_BOUNDARY The sequence of boundary control points are used to generate a smooth B-spline boundary curve (default mode).
RTC_SUBDIV_PIN_CORNERS Corner vertices are pinned to their location during subdivision.
RTC_SUBDIV_PIN_BOUNDARY All vertices at the border are pinned to their location during subdivision. This way the boundary is interpolated linearly.
RTC_SUBDIV_PIN_ALL All vertices at the border are binned to their location during subdivision. This way all patches are linearly interpolated.

These modes can be set to each topology separately using the rtcSetSubdivisionMode API call with the following signature:

void rtcSetSubdivisionMode(RTCScene scene, unsigned geomID,
                           unsigned topologyID, RTCSubdivisionMode mode);

These modes are typically used to interpolate face-varying data properly. E.g. the topology used to interpolate texture coordinaces are typically assigned the RTC_SUBDIV_PIN_BOUNDARY mode, to also map texels at the border of the texture to the mesh.

Also see tutorial Subdivision Geometry for an example of how to create subdivision surfaces.

Line Segment Hair Geometry

Line segments are supported to render hair geometry. A line segment consists of a start and end point, and start and end radius. Individual line segments are considered to be subpixel sized which allows the implementation to approximate the intersection calculation. This in particular means that zooming onto one line segment might show geometric artifacts.

Line segments are created using the rtcNewLineSegments2 function call, and potentially deleted using the rtcDeleteGeometry function call.

The number of line segments, the number of vertices, and optionally the number of time steps for multi-segment motion blur have to get specified at construction time of the line segment geometry.

The segment indices can be set by mapping and writing to the index buffer (RTC_INDEX_BUFFER) and the vertices can be set by mapping and writing into the vertex buffer (RTC_VERTEX_BUFFER). In case of motion blur, the vertex buffers (RTC_VERTEX_BUFFER0+t) have to get filled for each time step t.

The index buffer contains an array of 32 bit indices pointing to the ID of the first of two vertices, while the vertex buffer stores all control points in the form of a single precision position and radius stored in x, y, z, r order in memory. The radii have to be greater or equal zero. All buffers have to get unmapped before an rtcCommit call to the scene.

The intersection with the line segment primitive stores the parametric hit location along the line segment as u-coordinate (range [0, 1]; v is always set to zero). The geometry normal Ng is filled with the the tangent, i.e. the vector from start to end vertex.

Like for triangle meshes, the user can also specify a geometry mask and additional flags that choose the strategy to handle that mesh in dynamic scenes.

The following example demonstrates how to create some line segment geometry:

unsigned geomID = rtcNewLineSegments2(scene, geomFlags, numCurves,
                                      numVertices, 1);

struct Vertex { float x, y, z, r; };

Vertex* vertices = (Vertex*) rtcMapBuffer(scene, geomID, RTC_VERTEX_BUFFER);
// fill vertices here
rtcUnmapBuffer(scene, geomID, RTC_VERTEX_BUFFER);

int* curves = (int*) rtcMapBuffer(scene, geomID, RTC_INDEX_BUFFER);
// fill indices here
rtcUnmapBuffer(scene, geomID, RTC_INDEX_BUFFER);

Spline Hair Geometry

Hair geometries are supported, which consist of multiple hairs represented as cubic spline curves with varying radius per control point. As spline basis we currently support Bézier splines and B-splines. Individual hairs are considered to be subpixel sized which allows the implementation to approximate the intersection calculation. This in particular means that zooming onto one hair might show geometric artifacts.

Hair geometries are created using the rtcNewBezierHairGeometry2 or rtcNewBSplineHairGeometry2 function call, and potentially deleted using the rtcDeleteGeometry function call.

The number of hair curves, the number of vertices, and optionally the number of time steps for multi-segment motion blur have to get specified at construction time of the hair geometry.

The curve indices can be set by mapping and writing to the index buffer (RTC_INDEX_BUFFER) and the control vertices can be set by mapping and writing into the vertex buffer (RTC_VERTEX_BUFFER). In case of motion blur, the vertex buffers RTC_VERTEX_BUFFER0+t have to get filled for each time step.

The index buffer contains an array of 32 bit indices pointing to the ID of the first of four control vertices, while the vertex buffer stores all control points in the form of a single precision position and radius stored in x, y, z, r order in memory. The hair radii have to be greater or equal zero. All buffers have to get unmapped before an rtcCommit call to the scene.

The intersection with the hair primitive stores the parametric hit location along the hair as u-coordinate (range 0 to +1), and the normalized distance as the v-coordinate (range -1 to +1). The geometry normal Ng is filled with the the tangent of the bezier curve at the hit location on the curve (dPdu).

The implementation may choose to subdivide the Bézier curve into multiple cylinders-like primitives. The number of cylinders the curve gets subdivided into can be specified per hair geometry through the rtcSetTessellationRate(RTCScene scene, unsigned geomID, float rate) function. By default the tessellation rate for hair curves is 4.

Like for triangle meshes, the user can also specify a geometry mask and additional flags that choose the strategy to handle that mesh in dynamic scenes.

The following example demonstrates how to create some hair geometry:

unsigned geomID = rtcNewBezierHairGeometry2(scene, geomFlags, numCurves, numVertices);

struct Vertex { float x, y, z, r; };

Vertex* vertices = (Vertex*) rtcMapBuffer(scene, geomID, RTC_VERTEX_BUFFER);
// fill vertices here
rtcUnmapBuffer(scene, geomID, RTC_VERTEX_BUFFER);

int* curves = (int*) rtcMapBuffer(scene, geomID, RTC_INDEX_BUFFER);
// fill indices here
rtcUnmapBuffer(scene, geomID, RTC_INDEX_BUFFER);

Also see tutorial Hair for an example of how to create and use hair geometry.

Spline Curve Geometry

The spline curve geometry consists of multiple cubic spline curves with varying radius per control point. As spline basis we currently support Bézier splines and B-splines. The cuve surface is defined as the sweep surface of sweeping a varying radius circle tangential along the Bézier curve. As a limitation, the radius of the curve has to be smaller than the curvature radius of the Bézier curve at each location on the curve. In contrast to hair geometry, the curve geometry is rendered properly even in closeups.

Curve geometries are created using the rtcNewBezierCurveGeometry2 or rtcNewBSplineCurveGeometry2 function call, and potentially deleted using the rtcDeleteGeometry function call.

The number of Bézier curves, the number of vertices, and optionally the number of time steps for multi-segment motion blur have to get specified at construction time of the curve geometry.

The curve indices can be set by mapping and writing to the index buffer (RTC_INDEX_BUFFER) and the control vertices can be set by mapping and writing into the vertex buffer (RTC_VERTEX_BUFFER). In case of motion blur, the vertex buffers RTC_VERTEX_BUFFER0+t have to get filled for each time step.

The index buffer contains an array of 32 bit indices pointing to the ID of the first of four control vertices, while the vertex buffer stores all control points in the form of a single precision position and radius stored in x, y, z, r order in memory. The curve radii have to be greater or equal zero. All buffers have to get unmapped before an rtcCommit call to the scene.

Like for triangle meshes, the user can also specify a geometry mask and additional flags that choose the strategy to handle the curves in dynamic scenes.

Also see tutorial Curves for an example of how to create and use Bézier curve geometries.

User Defined Geometry

User defined geometries make it possible to extend Embree with arbitrary types of user defined primitives. This is achieved by introducing arrays of user primitives as a special geometry type.

User geometries are created using the rtcNewUserGeometry3 function call, and potentially deleted using the rtcDeleteGeometry function call.

When creating a user defined geometry, the user has to set a data pointer, a bounding function closure (function and user pointer) as well as user defined intersect and occluded callback function pointers. The bounding function is used to query the bounds of all timesteps of a user primitive, while the intersect and occluded callback functions are called to intersect the primitive with a ray.

The bounding function to register has the following signature

typedef void (*RTCBoundsFunc3)(void* userPtr, void* geomUserPtr, size_t id, size_t timeStep, RTCBounds& bounds_o);

and can be registered using the rtcSetBoundsFunction2 API function:

rtcSetBoundsFunction3(scene, geomID, userBoundsFunction, userPtr);

When the bounding callback is called, it is passed a user defined pointer specified at registration time of the bounds function (userPtr parameter), the per geometry user data pointer (geomUserPtr parameter), the ID of the primitive to calculate the bounds for (id parameter), the time step at which to calculate the bounds (timeStep parameter) and a memory location to write the calculated bound to (bounds_o parameter).

The signature of supported user defined intersect and occluded function in normal mode is as follows:

typedef void (*RTCIntersectFunc  ) (                   void* userDataPtr, RTCRay& ray, size_t item);
typedef void (*RTCIntersectFunc4 ) (const void* valid, void* userDataPtr, RTCRay4& ray, size_t item);
typedef void (*RTCIntersectFunc8 ) (const void* valid, void* userDataPtr, RTCRay8& ray, size_t item);
typedef void (*RTCIntersectFunc16) (const void* valid, void* userDataPtr, RTCRay16& ray, size_t item);

The RTCIntersectFunc callback function operates on single rays and gets passed the user data pointer of the user geometry (userDataPtr parameter), the ray to intersect (ray parameter), and the ID of the primitive to intersect (item parameter). The RTCIntersectFunc4/8/16 callback functions operate on ray packets of size 4, 8 and 16 and additionally get an integer valid mask as input (valid parameter). The callback functions should not modify any ray that is disabled by that valid mask.

In stream mode the following callback function has to get used:

typedef void (*RTCIntersectFuncN ) (const int*  valid, void* userDataPtr, const RTCIntersectContext* context, RTCRayN* rays, size_t N, size_t item);
typedef void (*RTCIntersectFunc1Mp)(                   void* userDataPtr, const RTCIntersectContext* context, RTCRay** rays, size_t M, size_t item);

The RTCIntersectFuncN callback function supports ray packets of arbitrary size N. The RTCIntersectFunc1Mp callback function get an array of M pointers to single rays as input.

The user intersect function should return without modifying the ray structure if the user geometry is missed. Whereas, if an intersection of the user primitive with the ray segment was found, the intersect function has to update the hit information of the ray (tfar, u, v, Ng, geomID, primID components).

The user occluded function should also return without modifying the ray structure if the user geometry is missed. If the geometry is hit, it should set the geomID member of the ray to 0.

When performing ray queries using the rtcIntersect and rtcOccluded function, callbacks of type RTCIntersectFunc are invoked for user geometries. Consequently, an application only operating on single rays only has to provide the single ray intersect and occluded callbacks. Similar when calling the rtcIntersect4/8/16 and rtcOccluded4/8/16 functions, the RTCIntersectFunc4/8/16 callbacks of matching packet size and type are called.

If ray stream mode is enabled for the scene only the RTCIntersectFuncN and RTCIntersectFunc1Mp callback can be used. In this case specifying an RTCIntersectFuncN callback is mandatory and the RTCIntersectFunc1Mp callback is optional. Trying to set a different type of user callback function results in an error.

The following example illustrates creating an array with two user geometries:

int numTimeSteps = 2;
struct UserObject { ... };

void userBoundsFunction(void* userPtr, UserObject* userGeomPtr, size_t i, size_t t, RTCBounds& bounds)
{
    bounds = <bounds of userGeomPtr[i] at time t>;
}

void userIntersectFunction(UserObject* userGeomPtr, RTCRay& ray, size_t i)
{
  if (<ray misses userGeomPtr[i] at time ray.time>)
    return;
  <update ray hit information>;
}

void userOccludedFunction(UserObject* userGeomPtr, RTCRay& ray, size_t i)
{
  if (<ray misses userGeomPtr[i] at time ray.time>)
    return;
  geomID = 0;
}

...

UserObject* userGeomPtr = new UserObject[2];
userGeomPtr[0] = ...
userGeomPtr[1] = ...
unsigned geomID = rtcNewUserGeometry3(scene, 2, numTimeSteps);
rtcSetUserData(scene, geomID, userGeomPtr);
rtcSetBoundsFunction3(scene, geomID, userBoundsFunction, userPtr);
rtcSetIntersectFunction(scene, geomID, userIntersectFunction);
rtcSetOccludedFunction(scene, geomID, userOccludedFunction);

See tutorial User Geometry for an example of how to use the user defined geometries.

Instances

Embree supports instancing of scenes inside another scene by some transformation. As the instanced scene is stored only a single time, even if instanced to multiple locations, this feature can be used to create very large scenes. Only single level instancing is supported by Embree natively, however, multi-level instancing can be implemented through user geometries.

Instances are created using the rtcNewInstance3 (RTCScene target, RTCScene source, size_t numTimeSteps) function call, and potentially deleted using the rtcDeleteGeometry function call. To instantiate a scene, one first has to generate the scene B to instantiate. Now one can add an instance of this scene inside a scene A the following way:

unsigned instID = rtcNewInstance3(sceneA, sceneB, 1);
rtcSetTransform2(sceneA, instID, RTC_MATRIX_COLUMN_MAJOR, &column_matrix_3x4, 0);

To create some motion blurred instance just pass the number of time steps and specify one matrix for each time step:

unsigned instID = rtcNewInstance3(sceneA, sceneB, 3);
rtcSetTransform2(sceneA, instID, RTC_MATRIX_COLUMN_MAJOR, &column_matrix_t0_3x4, 0);
rtcSetTransform2(sceneA, instID, RTC_MATRIX_COLUMN_MAJOR, &column_matrix_t1_3x4, 1);
rtcSetTransform2(sceneA, instID, RTC_MATRIX_COLUMN_MAJOR, &column_matrix_t2_3x4, 2);

Both scenes have to belong to the same device. One has to call rtcCommit on scene B before one calls rtcCommit on scene A. When modifying scene B one has to call rtcUpdate for all instances of that scene. If a ray hits the instance, then the geomID and primID members of the ray are set to the geometry ID and primitive ID of the primitive hit in scene B, and the instID member of the ray is set to the instance ID returned from the rtcNewInstance3 function.

Some special care has to be taken when using user geometries and instances in the same scene. Instantiated user geometries should not set the instID field of the ray as this field is managed by the instancing already. However, non-instantiated user geometries should clear the instID field to RTC_INVALID_GEOMETRY_ID, to later distinguish them from instantiated geometries that have the instID field set.

The rtcSetTransform2 call can be passed an affine transformation matrix with different data layouts:

Matrix layouts for rtcSetTransform2.
Layout Description
RTC_MATRIX_ROW_MAJOR The 3×4 float matrix is laid out in row major form.
RTC_MATRIX_COLUMN_MAJOR The 3×4 float matrix is laid out in column major form.
RTC_MATRIX_COLUMN_MAJOR_ALIGNED16 The 3×4 float matrix is laid out in column major form, with each column padded by an additional 4th component.

Passing homogeneous 4×4 matrices is possible as long as the last row is (0, 0, 0, 1). If this homogeneous matrix is laid out in row major form, use the RTC_MATRIX_ROW_MAJOR layout. If this homogeneous matrix is laid out in column major form, use the RTC_MATRIX_COLUMN_MAJOR_ALIGNED16 mode. In both cases, Embree will ignore the last row of the matrix.

The transformation passed to rtcSetTransform2 transforms from the local space of the instantiated scene to world space.

See tutorial Instanced Geometry for an example of how to use instances.

Ray Layout

The ray layout to be passed to the ray tracing core is defined in the embree2/rtcore_ray.h header file. It is up to the user to use the ray structures defined in that file, or resemble the exact same binary data layout with their own vector classes. The ray layout might change with new Embree releases as new features get added, however, will stay constant as long as the major Embree release number does not change. The ray contains the following data members:

Data fields of a ray.
Member In/Out Description
org in ray origin
dir in ray direction (can be unnormalized)
tnear in start of ray segment
tfar in/out end of ray segment, set to hit distance after intersection
time in time used for multi-segment motion blur [0,1]
mask in ray mask to mask out geometries
Ng out unnormalized geometry normal in object space
u out barycentric u-coordinate of hit
v out barycentric v-coordinate of hit
geomID out geometry ID of hit geometry
primID out primitive ID of hit primitive
instID out instance ID of hit instance

This structure is in struct of array layout (SOA) for API functions accepting ray packets.

To create a single ray you can use the RTCRay ray type defined in embree2/rtcore_ray.h. To generate a ray packet of size 4, 8, or 16 you can use the RTCRay4, RTCRay8, or RTCRay16 types. Alternatively you can also use the RTCRayNt template to generate ray packets of an arbitrary compile time known size.

When the ray packet size is not known at compile time (e.g. when Embree returns a ray packet in the RTCFilterFuncN callback function), then you can use the helper functions defined in embree2/rtcore_ray.h to access ray packet components:

float& RTCRayN_org_x(RTCRayN* rays, size_t N, size_t i);
float& RTCRayN_org_y(RTCRayN* rays, size_t N, size_t i);
float& RTCRayN_org_z(RTCRayN* rays, size_t N, size_t i);

float& RTCRayN_dir_x(RTCRayN* rays, size_t N, size_t i);
float& RTCRayN_dir_y(RTCRayN* rays, size_t N, size_t i);
float& RTCRayN_dir_z(RTCRayN* rays, size_t N, size_t i);

float& RTCRayN_tnear(RTCRayN* rays, size_t N, size_t i);
float& RTCRayN_tnear(RTCRayN* rays, size_t N, size_t i);

float&    RTCRayN_time(RTCRayN* ptr, size_t N, size_t i);
unsigned& RTCRayN_mask(RTCRayN* ptr, size_t N, size_t i);

float& RTCRayN_Ng_x(RTCRayN* ptr, size_t N, size_t i);
float& RTCRayN_Ng_y(RTCRayN* ptr, size_t N, size_t i);
float& RTCRayN_Ng_z(RTCRayN* ptr, size_t N, size_t i);

float& RTCRayN_u   (RTCRayN* ptr, size_t N, size_t i);
float& RTCRayN_v   (RTCRayN* ptr, size_t N, size_t i);

unsigned& RTCRayN_instID(RTCRayN* ptr, size_t N, size_t i);
unsigned& RTCRayN_geomID(RTCRayN* ptr, size_t N, size_t i);
unsigned& RTCRayN_primID(RTCRayN* ptr, size_t N, size_t i);

These helper functions get a pointer to the ray packet (rays parameter), the packet size N, and returns a reference to some component (e.g. x-component of origin) of the the ith ray of the packet.

Please note that there is some incompatibility in the layout of a single ray (RTCRay type) and a ray packet of size 1 (RTCRayNt<1> type) as the org and dir component are aligned to 16 bytes for single rays (see embree2/rtcore_ray.h). This incompatibility will get resolved in a future release, but has to be maintained for compatibility currently. Until then, the ray stream API will always use the single ray layout RTCRay for rays packets of size N=1, and the RTCRayNt layout for ray packets of size not equal 1. The helper functions above to access a ray packet of size N take care of this incompatibility.

Some callback functions get passed a hit structure with the following data members:

Data fields of a hit.
Member In/Out Description
instID in instance ID of hit instance
geomID in geometry ID of hit geometry
primID in primitive ID of hit primitive
u in barycentric u-coordinate of hit
v in barycentric v-coordinate of hit
t in hit distance
Ng in unnormalized geometry normal in object space

This structure is in struct of array layout (SOA) for hit packets of size N. The layout of a hit packet of size N is defined by the RTCHitNt template in embree2/rtcore_ray.h.

When the hit packet size is not known at compile time (e.g. when Embree returns a hit packet in the RTCFilterFuncN callback function), you can use the helper functions defined in embree2/rtcore_ray.h to access hit packet components:

unsigned& RTCHitN_instID(RTCHitN* hits, size_t N, size_t i);
unsigned& RTCHitN_geomID(RTCHitN* hits, size_t N, size_t i);
unsigned& RTCHitN_primID(RTCHitN* hits, size_t N, size_t i);

float& RTCHitN_u   (RTCHitN* hits, size_t N, size_t i);
float& RTCHitN_v   (RTCHitN* hits, size_t N, size_t i);
float& RTCHitN_t   (RTCHitN* hits, size_t N, size_t i);

float& RTCHitN_Ng_x(RTCHitN* hits, size_t N, size_t i);
float& RTCHitN_Ng_y(RTCHitN* hits, size_t N, size_t i);
float& RTCHitN_Ng_z(RTCHitN* hits, size_t N, size_t i);

These helper functions get a pointer to the hit packet (hits parameter), the packet size N, and returns a reference to some component (e.g. u-component) of the the ith hit of the packet.

Ray Queries

The API supports finding the closest hit of a ray segment with the scene (rtcIntersect functions), and determining if any hit between a ray segment and the scene exists (rtcOccluded functions).

Normal Mode

In normal mode the following API functions should be used to trace rays:

void rtcIntersect  (                   RTCScene scene, RTCRay&    ray);
void rtcIntersect4 (const void* valid, RTCScene scene, RTCRay4&   ray);
void rtcIntersect8 (const void* valid, RTCScene scene, RTCRay8&   ray);
void rtcIntersect16(const void* valid, RTCScene scene, RTCRay16&  ray);
void rtcOccluded   (                   RTCScene scene, RTCRay&    ray);
void rtcOccluded4  (const void* valid, RTCScene scene, RTCRay4&   ray);
void rtcOccluded8  (const void* valid, RTCScene scene, RTCRay8&   ray);
void rtcOccluded16 (const void* valid, RTCScene scene, RTCRay16&  ray);

The rtcIntersect and rtcOccluded function operate on single rays. The rtcIntersect4 and rtcOccluded4 functions operate on ray packets of size 4. The rtcIntersect8 and rtcOccluded8 functions operate on ray packets of size 8, and the rtcIntersect16 and rtcOccluded16 functions operate on ray packets of size 16.

For the ray packet mode with packet size of 4, 8, or 16, the user has to provide a pointer to 4, 8, or 16 of 32 bit integers that act as a ray activity mask (valid argument). If one of these integers is set to 0x00000000 the corresponding ray is considered inactive and if the integer is set to 0xFFFFFFFF, the ray is considered active. Rays that are inactive will not update any hit information.

Finding the closest hit distance is done through the rtcIntersect type functions. These get the activity mask (valid parameter), the scene (scene parameter), and a ray as input (ray parameter). The layout of the ray structure is described in Section Ray Layout. The user has to initialize the ray origin (org), ray direction (dir), and ray segment (tnear, tfar). The ray segment has to be in the range [0, ∞], thus ranges that start behind the ray origin are not valid, but ranges can reach to infinity. The implementation makes no guarantees if primitives whose hit distance is exactly at (or very close to) tnear or tfar are hit or missed. If you want to exclude intersections at tnear just pass a slighly enlarged tnear and if you want to include intersections at tfar pass a slighly enlarged tfar to Embree. The geometry ID (geomID member) has to get initialized to RTC_INVALID_GEOMETRY_ID (-1). If the scene contains instances, also the instance ID (instID) has to get initialized to RTC_INVALID_GEOMETRY_ID (-1). If the scene contains motion blur geometries, also the ray time (time) has to get initialized to a value in the range [0, 1]. If ray masks are enabled at compile time, also the ray mask (mask) has to get initialized. After tracing the ray, the hit distance (tfar), geometry normal (Ng)1, local hit coordinates (u, v), geometry ID (geomID), and primitive ID (primID) are set. If the scene contains instances, also the instance ID (instID) is set, if an instance is hit. The geometry ID corresponds to the ID returned at creation time of the hit geometry, and the primitive ID corresponds to the nth primitive of that geometry, e.g. nth triangle. The instance ID corresponds to the ID returned at creation time of the instance.

Testing if any geometry intersects with the ray segment is done through the rtcOccluded functions. Initialization has to be done as for rtcIntersect. If some geometry got found along the ray segment, the geometry ID (geomID) will get set to 0. Other hit information of the ray is undefined after calling rtcOccluded.

In normal mode, data alignment requirements for ray query functions operating on single rays is 16 bytes for the ray. Data alignment requirements for query functions operating on AOS packets of 4, 8, or 16 rays, is 16, 32, and 64 bytes respectively, for the valid mask and the ray. To operate on packets of 4 rays, the CPU has to support SSE, to operate on packets of 8 rays, the CPU has to support AVX, and to operate on packets of 16 rays, the CPU has to support AVX-512 instructions. Additionally, the required ISA has to be enabled in Embree at compile time to use the desired packet size.

The following code shows an example of setting up a single ray and traces it through the scene:

RTCRay ray;
ray.org = ray_origin;
ray.dir = ray_direction;
ray.tnear = 0.0f;
ray.tfar = inf;
ray.instID = RTC_INVALID_GEOMETRY_ID;
ray.geomID = RTC_INVALID_GEOMETRY_ID;
ray.primID = RTC_INVALID_GEOMETRY_ID;
ray.mask = 0xFFFFFFFF;
ray.time = 0.0f;
rtcIntersect(scene, ray);

See tutorial Triangle Geometry for a complete example of how to trace rays.

Ray Stream Mode

For the stream mode new functions got introduced that operate on streams of rays:

void rtcIntersect1M    (RTCScene scene, const RTCIntersectContext* context,
                        RTCRay* rays, size_t M, size_t stride);
void rtcIntersect1Mp   (RTCScene scene, const RTCIntersectContext* context,
                        RTCRay**rays, size_t M);
void rtcIntersectNM    (RTCScene scene, const RTCIntersectContext* context,
                        RTCRayN* rays, size_t N, size_t M, size_t stride);
void rtcIntersectNp    (RTCScene scene, const RTCIntersectContext* context,
                        RTCRayNp& rays, size_t N);

void rtcOccluded1M     (RTCScene scene, const RTCIntersectContext* context,
                        RTCRay* rays, size_t M, size_t stride);
void rtcOccluded1Mp    (RTCScene scene, const RTCIntersectContext* context,
                        RTCRay** rays, size_t M);
void rtcOccludedNM     (RTCScene scene, const RTCIntersectContext* context,
                        RTCRayN* rays, size_t N, size_t M, size_t stride);
void rtcOccludedNp     (RTCScene scene, const RTCIntersectContext* context,
                        RTCRayNp& rays, size_t N, size_t flags);

The rtcIntersectNM and rtcOccludedNM ray stream functions operate on an array of M ray packets of packet size N. The offset in bytes between consecutive ray packets can be specified by the stride parameter. Data alignment requirements for ray streams is 16 bytes. The packet size N has to be larger than 0 and the stream size M can be an arbitrary positive integer including 0. Tracing for example a ray stream consisting of four 8-wide SOA ray packets just requires to set the parameters N to 8, M to 4 and the stride to sizeof(RTCRay8). A ray in a ray stream is considered inactive during traversal/intersection if its tnear value is larger than its tfar value.

The ray streams functions rtcIntersect1M and rtcOccluded1M are just a shortcut for single ray streams with a packet size of N=1. rtcIntersect1Mp and rtcOccluded1Mp are similar to rtcIntersect1M and rtcOccluded1M while taking a stream of pointers to single rays as input. The rtcIntersectNp and rtcOccludedNp functions do not require the individual components of the SOA ray packets to be stored sequentially in memory, but at different adresses as specified in the RTCRayNp structure.

The intersection context passed to the stream version of the ray query functions, can specify some intersection flags to optimize traversal and a userRayExt pointer that can be used to extent the ray with additional data as described in Section Extending the Ray Structure. The intersection context is propagated to each stream user callback function invoked.

struct RTCIntersectContext
{
  RTCIntersectFlags flags;   //!< intersection flags
  void* userRayExt;          //!< can be used to pass extended ray data to callbacks
};

As intersection flag the user can currently specify if Embree should optimize traversal for coherent or incoherent ray distributions.

enum RTCIntersectFlags
{
  RTC_INTERSECT_COHERENT   = 0,  //!< optimize for coherent rays
  RTC_INTERSECT_INCOHERENT = 1   //!< optimize for incoherent rays
};

The following code shows an example of setting up a stream of single rays and tracing it through the scene:

RTCRay rays[128];

/* first setup all rays */
for (size_t i=0; i<128; i++)
{
  rays[i].org = calculate_ray_org(i);
  rays[i].dir = calculate_ray_dir(i);
  rays[i].tnear = 0.0f;
  rays[i].tfar = inf;
  rays[i].instID = RTC_INVALID_GEOMETRY_ID;
  rays[i].geomID = RTC_INVALID_GEOMETRY_ID;
  rays[i].primID = RTC_INVALID_GEOMETRY_ID;
  rays[i].mask = 0xFFFFFFFF;
  rays[i].time = 0.0f;
}

/* now create a context and trace the ray stream */
RTCIntersectContext context;
context.flags = RTC_INTERSECT_INCOHERENT;
context.userRayExt = nullptr;
rtcIntersectNM(scene, &context, &rays, 1, 128, sizeof(RTCRay));

See tutorial Stream Viewer for a complete example of how to trace ray streams.

Interpolation of Vertex Data

Smooth interpolation of per-vertex data is supported for triangle meshes, quad meshs, hair geometry, line segment geometry, and subdivision geometry using the rtcInterpolate2 API call. This interpolation function does ignore displacements and always interpolates the underlying base surface.

void rtcInterpolate2(RTCScene scene,
                     unsigned geomID, unsigned primID,
                     float u, float v,
                     RTCBufferType buffer,
                     float* P,
                     float* dPdu, float* dPdv,
                     float* ddPdudu, float* ddPdvdv, float* ddPdudv,
                     size_t numFloats);

This call smoothly interpolates the per-vertex data stored in the specified geometry buffer (buffer parameter) to the u/v location (u and v parameters) of the primitive (primID parameter) of the geometry (geomID parameter) of the specified scene (scene parameter). The interpolation buffer (buffer parameter) has to contain (at least) numFloats floating point values per vertex to interpolate. As interpolation buffer one can specify the RTC_VERTEX_BUFFER0 and RTC_VERTEX_BUFFER1 as well as one of two special user vertex buffers RTC_USER_VERTEX_BUFFER0 and RTC_USER_VERTEX_BUFFER1. These user vertex buffers can only get set using the rtcSetBuffer2 call, they cannot get managed internally by Embree as they have no default layout. The last element of the buffer has to be padded to 16 bytes, such that it can be read safely using SSE instructions.

The rtcInterpolate call stores numFloats interpolated floating point values to the memory location pointed to by P. One can avoid storing the interpolated value by setting P to NULL.

The first order derivative of the interpolation by u and v are stored at the dPdu and dPdv memory locations. One can avoid storing first order derivatives by setting both dPdu and dPdv to NULL.

The second order derivatives are stored at the ddPdudu, ddPdvdv, and ddPdudv memory locations. One can avoid storing second order derivatives by setting these three pointers to NULL.

The RTC_INTERPOLATE algorithm flag of a scene has to be enabled to perform interpolations.

It is explicitly allowed to call this function on disabled geometries. This makes it possible to use a separate subdivision mesh with different vertex creases, edge creases, and boundary handling for interpolation of texture coordinates if that is necessary.

The applied interpolation will do linear interpolation for triangle and quad meshes, linear interpolation for line segments, cubic Bézier interpolation for hair, and apply the full subdivision rules for subdivision geometry.

There is also a second interpolate call rtcInterpolateN2 that can be used for ray packets.

void rtcInterpolateN2(RTCScene scene, unsigned geomID,
                      const void* valid, const unsigned* primIDs,
                      const float* u, const float* v, size_t numUVs,
                      RTCBufferType buffer,
                      float* dP,
                      float* dPdu, float* dPdv,
                      float* ddPdudu, float* ddPdvdv, float* ddPdudv,
                      size_t numFloats);

This call is similar to the first version, but gets passed numUVs many u/v coordinates and a valid mask (valid parameter) that specifies which of these coordinates are valid. The valid mask points to numUVs integers and a value of -1 denotes valid and 0 invalid. If the valid pointer is NULL all elements are considers valid. The destination arrays are filled in structure of array (SoA) layout. The value numUVs has to be divisible by 4 and the destination buffer has to be at least numFloats*numUVs elements large.

See tutorial Interpolation for an example of using the rtcInterpolate2 function.

Buffer Sharing

Embree supports sharing of buffers with the application. Each buffer that can be mapped for a specific geometry can also be shared with the application, by passing a pointer, offset, stride, and number of elements of the application side buffer using the rtcSetBuffer2 API function.

void rtcSetBuffer2(RTCScene scene, unsigned geomID, RTCBufferType type,
                  void* ptr, size_t offset, size_t stride, size_t size);

The rtcSetBuffer2 function has to get called before any call to rtcMapBuffer for that buffer, otherwise the buffer will get allocated internally and the call to rtcSetBuffer2 will fail. The buffer has to remain valid as long as the geometry exists, and the user is responsible to free the buffer when the geometry gets deleted. When a buffer is shared, it is safe to modify that buffer without mapping and unmapping it. However, for dynamic scenes one still has to call rtcUpdate for modified geometries and the buffer data has to stay constant from the rtcCommit call to after the last ray query invocation.

The offset parameter specifies a byte offset to the start of the first element, the stride parameter specifies a byte stride between the different elements of the shared buffer and the size parameter specified the number of elements stored inside the buffer. This support for offset and stride allows the application quite some freedom in the data layout of these buffers, however, some restrictions apply. Index buffers always store 32 bit indices and vertex buffers always store single precision floating point data. The start address ptr+offset and stride always have to be aligned to 4 bytes, otherwise the rtcSetBuffer2 function will fail. The size parameter can be used to change the size of a buffer, which makes it possible to change the number of elements inside a mesh (by changing the size of the RTC_INDEX_BUFFER).

For vertex buffers (RTC_VERTEX_BUFFER and RTC_USER_VERTEX_BUFFER), the last element must be readable using SSE instructions, thus padding the last element to 16 bytes size is required for some layouts.

The following is an example of how to create a mesh with shared index and vertex buffers:

unsigned geomID = rtcNewTriangleMesh(scene, geomFlags, numTriangles, numVertices);
rtcSetBuffer2(scene, geomID, RTC_VERTEX_BUFFER, vertexPtr, 0, 3*sizeof(float), numVertices);
rtcSetBuffer2(scene, geomID, RTC_INDEX_BUFFER, indexPtr, 0, 3*sizeof(int), numTriangles);

Sharing buffers can significantly reduce the memory required by the application, thus we recommend using this feature. When enabling the RTC_COMPACT scene flag, the spatial index structures of Embree might also share the vertex buffer, resulting in even higher memory savings.

Multi-Segment Motion Blur

All geometry types support multi-segment motion blur with equidistant time steps and arbitrary number of time steps in the range of 2 to 129. Each geometry can have a different number of time steps. Some motion blur geometry is constructed by passing the number of time steps to the geometry construction function and setting the vertex arrays RTC_VERTEX_BUFFER0+t for each time step t:

unsigned geomID = rtcNewTriangleMesh(scene, geomFlags, numTris, numVertices, 3);
rtcSetBuffer2(scene, geomID, RTC_VERTEX_BUFFER0+0, vertex0Ptr, 0, sizeof(Vertex), numVertices);
rtcSetBuffer2(scene, geomID, RTC_VERTEX_BUFFER0+1, vertex1Ptr, 0, sizeof(Vertex), numVertices);
rtcSetBuffer2(scene, geomID, RTC_VERTEX_BUFFER0+2, vertex2Ptr, 0, sizeof(Vertex), numVertices);
rtcSetBuffer2(scene, geomID, RTC_INDEX_BUFFER, indexPtr, 0, sizeof(Triangle), numTris);

If a scene contains geometries with motion blur, the user has to set the time member of the ray to a value in the range [0, 1]. The motion blur geometry is defined by linearly interpolating the geometries of neighboring time steps. Each ray can specify a different time, even inside a ray packet.

User Data Pointer

A user data pointer can be specified and queried per geometry, to efficiently map from the geometry ID returned by ray queries to the application representation for that geometry.

void  rtcSetUserData (RTCScene scene, unsigned geomID, void* ptr);
void* rtcGetUserData (RTCScene scene, unsigned geomID);

The user data pointer of some user defined geometry get additionally passed to the intersect and occluded callback functions of that user geometry. Further, the user data pointer is also passed to intersection filter callback functions attached to some geometry.

The rtcGetUserData function is on purpose not thread safe with respect to other API calls that modify the scene. Consequently, this function can be used to efficiently query the user data pointer during rendering (also by multiple threads), but should not get called while modifying the scene with other threads.

Geometry Mask

A 32 bit geometry mask can be assigned to triangle meshes and hair geometries using the rtcSetMask call.

rtcSetMask(scene, geomID, mask);

Only if the bitwise and operation of this mask with the mask stored inside the ray is not 0, primitives of this geometry are hit by a ray. This feature can be used to disable selected triangle mesh or hair geometries for specifically tagged rays, e.g. to disable shadow casting for some geometry. This API feature is disabled in Embree by default at compile time, and can be enabled in CMake through the EMBREE_RAY_MASK parameter.

Filter Functions

The API supports per geometry filter callback functions that are invoked for each intersection found during the rtcIntersect or rtcOccluded calls. The former ones are called intersection filter functions, the latter ones occlusion filter functions. The filter functions can be used to implement various useful features, such as accumulating opacity for transparent shadows, counting the number of surfaces along a ray, collecting all hits along a ray, etc. Filter functions can also be used to selectively reject hits to enable backface culling for some geometries. If the backfaces should be culled in general for all geometries then it is faster to enable EMBREE_BACKFACE_CULLING during compilation of Embree instead of using filter functions.

If the RTC_SCENE_HIGH_QUALITY mode is set, the intersection and occlusion filter functions may be called multiple times for the same hit. For some usage scenarios, the application may have to work around this by collecting already reported hits (geomID/primID pairs) and ignoring duplicates for some usage scenarios.

Normal Mode

In normal mode the filter functions provided by the user need to have the following signature:

void RTCFilterFunc  (                   void* userDataPtr, RTCRay&   ray);
void RTCFilterFunc4 (const void* valid, void* userDataPtr, RTCRay4&  ray);
void RTCFilterFunc8 (const void* valid, void* userDataPtr, RTCRay8&  ray);
void RTCFilterFunc16(const void* valid, void* userDataPtr, RTCRay16& ray);

The valid pointer points to an integer valid mask (0 means invalid and -1 means valid). The userDataPtr is a user pointer optionally set per geometry through the rtcSetUserData function. All hit information inside the ray is valid. If the hit geometry is instanced, the instID member of the ray is valid and the ray origin, direction, and geometry normal visible through the ray are in object space.

The filter function can reject a hit by setting the geomID member of the ray to RTC_INVALID_GEOMETRY_ID, otherwise the hit is accepted. The filter function is not allowed to modify the ray input data (org, dir, time, mask, and tnear members), but can modify the hit data of the ray (u, v, Ng, tfar, geomID, primID, and instID members). Updating the tfar distance to a smaller value is possible without limitation. However, increasing the tfar distance of the ray to a larger value tfar' , does not guarantee intersections between tfar and tfar' to be reported later, as the corresponding subtrees might have gotten culled already.

The intersection and occlusion filter functions for different ray types are set for some geometry of a scene using the following API functions:

void rtcSetIntersectionFilterFunction  (RTCScene, unsigned geomID, RTCFilterFunc   filter);
void rtcSetIntersectionFilterFunction4 (RTCScene, unsigned geomID, RTCFilterFunc4  filter);
void rtcSetIntersectionFilterFunction8 (RTCScene, unsigned geomID, RTCFilterFunc8  filter);
void rtcSetIntersectionFilterFunction16(RTCScene, unsigned geomID, RTCFilterFunc16 filter);

void rtcSetOcclusionFilterFunction  (RTCScene, unsigned geomID, RTCFilterFunc   filter);
void rtcSetOcclusionFilterFunction4 (RTCScene, unsigned geomID, RTCFilterFunc4  filter);
void rtcSetOcclusionFilterFunction8 (RTCScene, unsigned geomID, RTCFilterFunc8  filter);
void rtcSetOcclusionFilterFunction16(RTCScene, unsigned geomID, RTCFilterFunc16 filter);

The intersection and occlusion filter functions of type RTCFilterFunc are only called by the rtcIntersect and rtcOccluded functions. Similar the filter functions of type FilterFunc4, FilterFunc8, and FilterFunc16 are called by rtcIntersect4/8/16 and rtcOccluded4/8/16 of matching width.

Stream Mode

For ray stream mode a new type of filter function RTCFilterFuncN got introduced:

void RTCFilterFuncN (int* valid,
                     void* userDataPtr,
                     const RTCIntersectContext* context,
                     RTCRayN* ray,
                     const RTCHitN* potentialHit,
                     const size_t N);

The stream intersection and occlusion filter functions of this new type are set for some geometry of a scene using the following API functions:

void rtcSetIntersectionFilterFunctionN (RTCScene, unsigned geomID, RTCFilterFuncN filter);
void rtcSetOcclusionFilterFunctionN    (RTCScene, unsigned geomID, RTCFilterFuncN filter);

For the callback RTCFilterFuncN, the valid parameter points to an integer valid mask (0 means invalid and -1 means valid). The userDataPtr is a user pointer optionally set per geometry through the rtcSetUserData function. The context parameter points to the intersection context passed to the ray query function. The ray parameter contains the current ray. All hit data inside the ray are undefined, except the tfar value. The potentialHit parameter points to the new hit to test and update. The N parameter is the number of rays and hits found in the ray and potentialHit. If the hit geometry is instanced, the instID member of the ray is valid and the ray as well as the potential hit are in object space.

As the ray packet size N can be arbitrary, the ray and hit should get accessed through the helper functions as describe in Section Ray Layout.

The callback function has the task to check for each valid ray whether it wants to accept or reject the corresponding hit. To reject a hit, the filter callback function just has to write 0 to the integer valid mask of the corresponding ray. The filter function is not allowed to modify the ray input data (org, dir, time, mask, and tnear members), nor the potential hit, nor inactive components.

An intersection filter callback function can accept a hit by updating all hit data members of the ray (u, v, Ng, tfar, geomID, primID, and instID members) and keep the valid mask set to -1.

An occlusion filter callback function can accept a hit by setting the geomID member of the ray to 0 and keep the valid mask set to -1.

The intersection filter callback of most applications will just copy the potentialHit into the appropiate fields of the ray, but this is not a requirement and the hit data of the ray can get modified arbitrarily. Updating the tfar distance to a smaller value (e.g. the t distance of the potential hit) is possible without limitation. However, increasing the tfar distance of the ray to a larger value tfar' , does not guarantee intersections between tfar and tfar' to be reported later, as the corresponding subtrees might have gotten culled already.

Displacement Mapping Functions

The API supports displacement mapping for subdivision meshes. A displacement function can be set for some subdivision mesh using the rtcSetDisplacementFunction API call.

void rtcSetDisplacementFunction2(RTCScene, unsigned geomID, RTCDisplacementFunc, RTCBounds*);

A displacement function of NULL will delete an already set displacement function. The bounds parameter is optional. If NULL is passed as bounds, then the displacement shader will get evaluated during the build process to properly bound displaced geometry. If a pointer to some bounds of the displacement are passed, then the implementation can choose to use these bounds to bound displaced geometry. When bounds are specified, then these bounds have to be conservative and should be tight for best performance.

The displacement function has to have the following type:

typedef void (*RTCDisplacementFunc2)(void* ptr,
                                     unsigned geomID, unsigned primID, unsigned timeStep,
                                     const float* u,  const float* v,
                                     const float* nx, const float* ny, const float* nz,
                                     float* px, float* py, float* pz,
                                     size_t N);

The displacement function is called with the user data pointer of the geometry (ptr), the geometry ID (geomID), and primitive ID (primID) of a patch to displace. For motion blur the time step timeStep is also specified, such that the function can be time varying. For the patch, a number N of points to displace are specified in a struct of array layout. For each point to displace the local patch UV coordinates (u and v arrays), the normalized geometry normal (nx, ny, and nz arrays), as well as world space position (px, py, and pz arrays) are provided. The task of the displacement function is to use this information and move the world space position inside the allowed specified bounds around the point.

All passed arrays are guaranteed to be 64 bytes aligned, and properly padded to make wide vector processing inside the displacement function possible.

The displacement mapping functions might get called during the rtcCommit call, or lazily during the rtcIntersect or rtcOccluded calls.

Also see tutorial Displacement Geometry for an example of how to use the displacement mapping functions.

Extending the Ray Structure

Normal Mode

If Embree is used in normal mode, the ray passed to the filter callback functions and user geometry callback functions is guaranteed to be the same ray pointer initially provided to the ray query function by the user. For that reason, it is safe to extend the ray by additional data and access this data inside the filter callback functions (e.g. to accumulate opacity) and user geometry callback functions.

Stream Mode

If Embree is used in stream mode, the ray passed to the filter callback and user geometry callback functions is not guaranteed to be the same ray pointer initially passed to the ray query function, as the stream implementation may decide to copy rays around, reorder them, and change the data layout internally when appropiate (e.g. SOA to AOS conversion).

To identify specific rays in the callback functions, the user has to pass an ID with each ray and set the userRayExt member of the intersection context to point to its ray extensions. The ray extensions can be stored in a seprarate memory location but also just after the end of each ordinary ray (or ray packet). In the latter case, you can just point the userRayExt to the input rays.

To encode a ray ID the ray mask field can be used entirely when the ray mask feature is disabled, or unused bits of the ray mask can be used in case the ray mask feature is enabled (e.g. by using the lower 16 bits as ray ID, and the upper 16 bits as ray mask, and setting the lower 16 bits of each geometry mask always to 0).

The intersection context provided to the stream ray query functions is passed to each stream callback function (e.g. RTCIntersectFuncN, RTCIntersectFunc1Mp, or RTCFilterFuncN). Thus, in the callback function, the ray ID can get decoded, and the extended ray data accessed through the userRayExt pointer stored inside the intersection context. For SPMD type programs this access requires gather and scatter operations to access the user ray extensions.

Not that using the ray ID to access the ray extensions is necessary, as the ray IDs might have changed from the IDs passed to the ray query function. E.g. if you trace a ray packet with 8 rays 0 to 8, then even if a callback gets called with a ray packet of 8 rays, they rays might have gotten reordered. Further, the callback might get called with a subpacket of a size smaller than 8 (e.g. N=5). However, optimizing for the common case in which Embree keeps such a packet intact (thus having a special codepath for N=8 and unchanged IDs) can give higher performance.

Sharing Threads with Embree

On some implementations, Embree supports using the application threads when building internal data structures, by using the

void rtcCommitThread(RTCScene, unsigned threadIndex, unsigned threadCount);

API call to commit the scene. This function has to get called by all threads that want to cooperate in the scene commit. Each call is provided the scene to commit, the index of the calling thread in the range [0, threadCount-1], and the number of threads that will call into this commit operation for the scene. All threads will return again from this function after the scene commit is finished.

Multiple such scene commit operations can also be running at the same time, e.g. it is possible to commit many small scenes in parallel using one thread per commit operation. Subsequent commit operations for the same scene can use different number of threads in the rtcCommitThread or use the Embree internal threads using the rtcCommit call.

Note: When using Embree with the Intel® Threading Building Blocks (which is the default) you should not use the rtcCommitThread function. Sharing of your threads with TBB is not possible and TBB will always generate its own set of threads. We recommend to also use TBB inside your application to share threads with the Embree library. When using TBB inside your application do never use the rtcCommitThread function.

Note: When enabling the Embree internal tasking system the rtcCommitThread feature will work as expected and use the application threads for hierarchy building.

Join Build Operation

The special rtcCommitJoin function can be used from multiple threads to join a scene build operation. All thread have to consistently call rtcCommitJoin and no other commit variant.

This feature allows a flexible way to lazily create hierarchies during rendering. A thread reaching a not yet constructed sub-scene of a two-level scene, can generate the sub-scene geometry and call rtcCommitJoin on that just generated scene. During construction, further threads reaching the not-yet-built scene, can join the build operation by also invoking rtcCommitJoin. A thread that calls rtcCommitJoin after the build finishes, will directly return from the rtcCommitJoin call (even for static scenes).

Note: When using Embree with the Intel® Threading Building Blocks, thread that call rtcCommitJoin will join the build operation, but other TBB worker threads might also participate in the build. To avoid thread oversubscription, we recommend using TBB also inside the application. Further, the join mode only works properly starting with TBB v4.4 Update 1. For earlier TBB versions threads that call rtcCommitJoin to join a running build will just wait for the build to finish.

Note: When using Embree with the internal tasking system, exclusively threads that call rtcCommitJoin will perform the build operation, and no additional worker threads are scheduled.

Memory Monitor Callback

Using the memory monitor callback mechanism, the application can track the memory consumption of an Embree device, and optionally terminate API calls that consume too much memory.

The user provided memory monitor callback function must have the following signature:

bool (*RTCMemoryMonitorFunc2)(void* userPtr, const ssize_t bytes, const bool post);

A single such callback function per device can be registered by calling

rtcDeviceSetMemoryMonitorFunction2(RTCDevice device, RTCMemoryMonitorFunc2 func, void* userPtr);

and deregistered again by calling it with NULL as function pointer. Once registered the Embree device will invoke the callback function before or after it allocates or frees important memory blocks. The userPtr value that is set at registration time is passed to each invokation of the callback function. The callback function might get called from multiple threads concurrently.

The application can track the current memory usage of the Embree device by atomically accumulating the provided bytes input parameter. This parameter will be >0 for allocations and <0 for deallocations. The post input parameter is true if the callback function was invoked after the allocation or deallocation, otherwise it is false.

Embree will continue its operation normally when returning true from the callback function. If false is returned, Embree will cancel the current operation with the RTC_OUT_OF_MEMORY error code. Cancelling will only happen when the callback was called for allocations (bytes > 0), otherwise the cancel request will be ignored. If a callback that was invoked before the allocation happens (post == false) cancels the operation, then the bytes parameter should not get accumulated, as the allocation will never happen. If a callback that was called after the allocation happened (post == true) cancels the operation, then the bytes parameter should get accumulated, as the allocation properly happened. Issuing multiple cancel requests for the same operation is allowed.

Progress Monitor Callback

The progress monitor callback mechanism can be used to report progress of hierarchy build operations and to cancel long lasting build operations.

The user provided progress monitor callback function has to have the following signature:

bool (*RTCProgressMonitorFunc)(void* userPtr, const double n);

A single such callback function can be registered per scene by calling

rtcSetProgressMonitorFunction(RTCScene, RTCProgressMonitorFunc, void* userPtr);

and deregistered again by calling it with NULL for the callback function. Once registered Embree will invoke the callback function multiple times during hierarchy build operations of the scene, by providing the userPtr pointer that was set at registration time, and a double n in the range [0, 1] estimating the completion amount of the operation. The callback function might get called from multiple threads concurrently.

When returning true from the callback function, Embree will continue the build operation normally. When returning false Embree will cancel the build operation with the RTC_CANCELLED error code. Issuing multiple cancel requests for the same build operation is allowed.

Configuring Embree

Some internal device parameters can be set and queried using the rtcDeviceSetParameter1i and rtcDeviceGetParameter1i API call. The parameters from the following table are available to set/query:

Parameters for rtcDeviceSetParameter and rtcDeviceGetParameter.
Parameter Description Read/Write
RTC_CONFIG_VERSION_MAJOR returns Embree major version Read only
RTC_CONFIG_VERSION_MINOR returns Embree minor version Read only
RTC_CONFIG_VERSION_PATCH returns Embree patch version Read only
RTC_CONFIG_VERSION returns Embree version as integer e.g. Embree v2.8.2 → 20802 Read only
RTC_CONFIG_INTERSECT1 checks if rtcIntersect1 is supported Read only
RTC_CONFIG_INTERSECT4 checks if rtcIntersect4 is supported Read only
RTC_CONFIG_INTERSECT8 checks if rtcIntersect8 is supported Read only
RTC_CONFIG_INTERSECT16 checks if rtcIntersect16 is supported Read only
RTC_CONFIG_INTERSECT_STREAM checks if rtcIntersect1M, rtcIntersect1Mp, rtcIntersectNM, and rtcIntersectNp are supported Read only
RTC_CONFIG_TRIANGLE_GEOMETRY checks if triangle geometries are supported Read only
RTC_CONFIG_QUAD_GEOMETRY checks if quad geometries are supported Read only
RTC_CONFIG_LINE_GEOMETRY checks if line geometries are supported Read only
RTC_CONFIG_HAIR_GEOMETRY checks if hair geometries are supported Read only
RTC_CONFIG_SUBDIV_GEOMETRY checks if subdivision meshes are supported Read only
RTC_CONFIG_USER_GEOMETRY checks if user geometries are supported Read only
RTC_CONFIG_RAY_MASK checks if ray masks are supported Read only
RTC_CONFIG_BACKFACE_CULLING checks if backface culling is supported Read only
RTC_CONFIG_INTERSECTION_FILTER checks if intersection filters are enabled Read only
RTC_CONFIG_INTERSECTION_FILTER_RESTORE checks if intersection filters restore previous hit Read only
RTC_CONFIG_IGNORE_INVALID_RAYS checks if invalid rays are ignored Read only
RTC_CONFIG_TASKING_SYSTEM return used tasking system (0 = INTERNAL, 1 = TBB, 2 = PPL) Read only
RTC_SOFTWARE_CACHE_SIZE Configures the software cache size (used to cache subdivision surfaces for instance). The size is specified as an integer number of bytes. The software cache cannot be configured during rendering. Write only
RTC_CONFIG_COMMIT_JOIN Checks if rtcCommit can be used to join build operation (not supported when Embree is compiled with some older TBB versions) Read only
RTC_CONFIG_COMMIT_THREAD Checks if rtcCommitThread is available (not supported when Embree is compiled with some older TBB versions) Read only

For example, to configure the size of the internal software cache that is used to handle subdivision surfaces use the RTC_SOFTWARE_CACHE_SIZE parameter to set desired size of the cache in bytes:

rtcDeviceSetParameter1i(device, RTC_SOFTWARE_CACHE_SIZE, bytes);

The software cache cannot get configured while any Embree API call is executed. Best configure the size of the cache only once at application start.

Limiting number of Build Threads

You can use the TBB API to limit the number of threads used by Embree during hierarchy construction. Therefore just create a global taskscheduler_init object, initialized with the number of threads to use:

#include <tbb/tbb.h>

tbb::task_scheduler_init init(numThreads);

Thread Creation and Affinity Settings

Tasking systems like TBB create worker threads on demand which will add a runtime overhead for the very first rtcCommit call. In case you want to benchmark the scene build time, you should start threads at application startup. You can let Embree start TBB threads by passing start_threads=1 to the init parameter of rtcNewDevice.

On machines with a high thread count (e.g. dual-socket Xeon or Xeon Phi machines), affinitizing TBB worker threads increases build and rendering performance. You can let Embree affinitize TBB worker threads by passing set_affinity=1 to the init parameter of rtcNewDevice. By default threads are not affinitized by Embree with the exception of Xeon Phi Processors where they are affinitized by default.

All Embree tutorials automatically start and affinitize TBB worker threads by passing start_threads=1,set_affinity=1 to rtcNewDevice.

Huge Page Support

Embree supports 2MB huge pages under Windows, Linux, and MacOSX. Under Linux huge page support is enabled by default and under Windows and MacOSX disabled by default. Huge page support can get enabled in Embree by passing hugepages=1 to rtcNewDevice or disabled by passing hugepages=0 to rtcNewDevice.

We recommend using 2MB huge pages with Embree under Linux as this improves ray tracing performance by about 5 - 10%. Under Windows using huge pages requires the application to run in elavated mode which is a security issue. Under MacOSX huge pages are rarely available as memory tends to get quickly fragmented.

Huge Pages under Windows

To use huge pages under Windows, the current user must have the “Lock pages in memory” (SeLockMemoryPrivilege) assigned. This can be configured through the “Local Security Policy” application, by adding a user to “Local Policies” -> “User Rights Assignment” -> “Lock pages in memory”. You have to log out and in again for this change to take effect.

Further, your application has to be executed as an elevated process (“Run as administrator”) and the “SeLockMemoryPrivilege” must explicitely be enabled by your application. Example code on how to enable this privilege can be found in the “common/sys/alloc.cpp” file of Embree. Alternatively, Embree will try to enable this privilege when passing enable_selockmemoryprivilege=1 to rtcNewDevice. Further, huge pages have to get enabled in Embree by passing hugepages=1 to rtcNewDevice.

When the system was running for a while, physical memory gets fragmented, which can slow down the allocation of huge pages significantly.

Huge Pages under Linux

Linux supports transparent huge pages and explicit huge pages. To enable transparent huge page support under Linux execute the following as root:

echo always > /sys/kernel/mm/transparent_hugepage/enabled

When transparent huge pages are enabled, the kernel tries to merge 4k pages to 2MB pages when possible as a background job. See the following webpage for more information on transparent huge pages under Linux https://www.kernel.org/doc/Documentation/vm/transhuge.txt. In this mode each application, including your rendering application building on Embree, will automatically tend to use huge pages.

Using transparent huge pages the transitioning from 4k to 2MB pages might take some time. For that reason Embree also supports allocating 2MB pages directly when a huge page pool is configured. Such a pool can get configured by writing some number of huge pages to alloacte to /proc/sys/vm/nr_overcommit_hugepages as root user. E.g. to configure 2GB of adress space for huge page allocation, execute the following as root:

echo 1000 > /proc/sys/vm/nr_overcommit_hugepages

See the following webpage for more information on huge pages under Linux https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt.

Huge Pages under MacOSX

To use huge pages under MacOSX you have to pass hugepages=1 to rtcNewDevice to enable that feature in Embree.

When the system was running for a while, physical memory gets quickly fragmented, and causes huge page allocations to fail. For this reason huge pages not very usefull under MacOSX in practise.

BVH Builder API

The Embree API exposes internal BVH builders to build BVHs with any desired node and leaf layout. To invoke the BVH builder you have to create a BVH object using the rtcNewBVH function and deleted again using the rtcDeleteBVH function.

RTCBVH rtcNewBVH(RTCDevice device);
void rtcDeleteBVH(RTCBVH bvh);

This BVH contains some builder state and fast node allocator. Some settings have to be passed to be BVH build function:

enum RTCBuildQuality
{
  RTC_BUILD_QUALITY_LOW = 0,     //!< build low quality BVH (good for dynamic scenes)
  RTC_BUILD_QUALITY_NORMAL = 1,  //!< build standard quality BVH
  RTC_BUILD_QUALITY_HIGH = 2,    //!< build high quality BVH
};
  
struct RTCBuildSettings
{
  unsigned size;               //!< size of this structure in bytes
  RTCBuildQuality quality;     //!< quality of BVH build
  unsigned maxBranchingFactor; //!< branching factor of BVH to build
  unsigned maxDepth;           //!< maximal depth of BVH to build
  unsigned sahBlockSize;       //!< blocksize for SAH heuristic
  unsigned minLeafSize;        //!< minimal size of a leaf
  unsigned maxLeafSize;        //!< maximal size of a leaf
  float travCost;              //!< estimated cost of one traversal step
  float intCost;               //!< estimated cost of one primitive intersection
  unsigned extraSpace;         //!< for spatial splitting we need extra space at end of primitive array
};

Some default values for the settings can be obtained using the rtcDefaultBuildSettings function. Using the quality setting, one can select between a faster low quality build which is good for dynamic scenes, and a standard quality build for static scenes. One can also specify the desired maximal branching factor of the BVH (maxBranchingFactor setting), the maximal depth the BVH should have (maxDepth setting), some power of 2 block size for the SAH heuristic (sahBlockSize), the minimal and maximal leaf size (minLeafSize and maxLeafSize setting), and the estimated cost of one traversal step and primitve intersection (travCost and intCost setting). To spatially split primitives in high quality mode, the builder needs some extra space at the end of the build primitive array. The amount of extra space can be passed using the extraSpace setting, and should be about the same size as there are primitives. The size member has always to be set to the size of the RTCBuildSettings structure in bytes.

Four callback functions have to get registered which are invoked during build to create BVH nodes (RTCCreateNodeFunc), set the pointers to all children (RTCSetNodeChildrenFunc), set the bounding boxes of all children (RTCSetNodeBoundsFunc), and to create a leaf node (RTCCreateLeafFunc).

typedef void* (*RTCCreateNodeFunc) (RTCThreadLocalAllocator allocator,
                                    size_t numChildren, void* userPtr);

typedef void  (*RTCSetNodeChildFunc) (void* nodePtr, void** childPtrs, size_t numChildren,
                                      void* userPtr);

typedef void  (*RTCSetNodeBoundsFunc) (void* nodePtr, const RTCBounds** bounds, size_t numChildren,
                                       void* userPtr);

typedef void* (*RTCCreateLeafFunc) (RTCThreadLocalAllocator allocator,
                                    const RTCBuildPrimitive* primitives, size_t numPrimitives,
                                    void* userPtr);

typedef void  (*RTCSplitPrimitiveFunc) (const RTCBuildPrimitive& prim,
    unsigned dim, float pos, RTCBounds& lbounds, RTCBounds& rbounds, void* userPtr);

The RTCCreateNodeFunc and RTCCreateLeafFunc type callbacks are passed a thread local allocator object that should be used for fast allocation of nodes using the rtcThreadLocalAlloc function.

void* rtcThreadLocalAlloc(RTCThreadLocalAllocator allocator, size_t bytes, size_t align);

We strongly recommend using this allocation mechanism, as alternative approaches like standard malloc can be over 10x slower. The allocator object passed to the create callbacks has to be used only inside the current thread.

The RTCCreateNodeFunc callback additionally gets passed the number of children for this node in the range from 2 to maxBranchingFactor (numChildren argument).

The RTCSetNodeChildFunc callback function, gets passed a pointer to the node as input (nodePtr argument), an array of pointers to the children (childPtrs argument), and the size of this array (numChildren argument).

The RTCSetNodeBoundsFunc callback function, get a pointer to the node as input (nodePtr argument), an array of pointers to the bounding boxes of the children (bounds argument), and the size of this array (numChildren argument).

The RTCCreateLeafFunc callback additionally get an array of primitives as input (primitives argument), and the size of this array (numPrimitives argument). The callback should read the geomID and primID members from the passed primitives to construct the leaf.

The RTCSplitPrimitiveFunc callback is invoked in high quality mode to split a primitive (prim argument) at some specified position (pos argument) and dimension (dim argument). The callback should return bounds of the clipped left and right part of the primitive (lbounds and rbounds arguments).

There is an optional progress callback function that can be used to get progress on the BVH build.

typedef void (*RTCBuildProgressFunc) (size_t N, void* userPtr);

This progress function is called with a number N of primitives the build is finished for. Accumulating over all invokations will sum up to the number of primitives passed to be BVH build function.

All callback functions are typically called from multiple threads, thus their implementation has to be thread safe.

All callback function get a user defined pointer (userPtr argument) as input which is provided to the rtcBuildBVH call. This pointer can be used to access the application scene object inside the callback functions.

The BVH build is invoked using the rtcBuildBVH function:

void* rtcBuildBVH(RTCBVH bvh,                             //!< BVH to build
                  const RTCBuildSettings& settings,       //!< settings for BVH builder
                  RTCBuildPrimitive* primitives,          //!< list of input primitives
                  size_t numPrimitives,                   //!< number of input primitives
                  RTCCreateNodeFunc createNode,           //!< creates a node
                  RTCSetNodeChildrenFunc setNodeChildren, //!< sets pointer to a child
                  RTCSetNodeBoundsFunc setNodeBounds,     //!< sets bound of a child
                  RTCCreateLeafFunc createLeaf,           //!< creates a leaf
                  RTCSplitPrimitiveFunc splitPrimitive,   //!< splits a primitive into two halves
                  RTCBuildProgressFunc buildProgress      //!< used to report build progress
                  void* userPtr);                         //!< user pointer passed to callback functions

The function gets passed the BVH objects (bvh argument), the build settings to use (settings argument), the array of primitives (primitives argument) and its size (numPrimitives argument), the previously described callback function pointers, and a user defined pointer (userPtr argument) that is passed to all callback functions. The function pointer to the primitive split function (splitPrimitive argument) may be NULL, however, then no spatial splitting in high quality mode is possible. The function pointer used to report the build progress (buildProgress argument) is optional and may also be NULL.

For static scenes that do not require a further rtcBuildBVH call one should use the rtcMakeStatic function after the build which clears some internal data.

void rtcMakeStaticBVH(RTCBVH bvh);

  1. Note that Ng is in object space and needs to be transformed to world space for instanced scenes.