BubbleML

BubbleML Documentation

The BubbleML dataset consists of several studies, each composed of multiple simulations. Each of these simulations is stored as one HDF5 file. All of the HDF5 files store relevent tensor data. Each field is stored as a separate HDF5 dataset in the simulation file.

temperature temperature
pressure gradient pressure
x-velocity velx
y-velocity vely
signed distance function dfun
x-coordinate grid x
y-coordinate grid y
real-valud runtime parameters real-runtime-params
integer-valued runtime paramters int-runtime-params

The simulation data can be accessed using h5py. Here, we load the temperature data into a torch tensor:

import h5py
import torch

with h5py.File(<path-to-sim>) as f:
    # load temperature into a torch tensor
    temp = torch.from_numpy(f['temperature'][:])
    # load runtime params into a numpy array
    real_params = f['real-runtime-params'][:]

    # 3 dimensions
    print(temp.dim())

All simulations fields are laid out in memory identically: T x Y x X. The first dimension is time, the second is the rows of the domain, and the third is the columns of the domain. This layout makes indexing hdf5 files by time faster since each domain will be laid out contiguously in memory. In our experiments, we always index by time. Every tensor field will have an identical shape:

f['temperature'][:].shape == f['pressure'][:].shape == ...

For a full example of how to read and visualize each field, check the data loading example.

Metadata (runtime-params)

There is a lot of metadata associated with each of the Flash-X simulations. The metadata is stored as a set of key-value pairs. The key is essentially some variable name, like ins_invreynolds is the inverse Reynolds number. Some settings may be difficult to interpret and many will be unnecessary for most users. We list out keys that are particularly relevant. Some of these settings, like the Reynolds and Prandtl number are important parameters used for the governing equation. These will be critical when implementing a physics-informed model. Here, we point out some of the important metadata keys:

Real runtime parameters (real-runtime-params):

Inverse Reynold’s number: ins_invreynolds
Stefan Number: mph_stefan
Prandtl Number: ht_prandtl
Non-dimenionalized Saturation temperature: mph_tsat
Non-dimenionalized Bulk temperature: ht_tbulk
Non-dimensional min and max temperature: ht_twall_high, ht_twall_low
Domain sizes: xmin, xmax, ymin, ymax

The governing equation for the vapor phase includes the thermal diffusivity. This is computed from three values:

Specific heat capacity: mph_cpgas,
Density: mph_rhogas,
Thermal Conductivity: mph_thcogas

In the liquid phase, the thermal diffusivity is set to one.

The integer runtime parameters include settings for the resolution and are necessary for unblocking. These may be necessary to use if trying to extend BubbleML and want to unblock the dataset. Integer runtime parameters (int-runtime-params):

The number of blocks in the x,y,z directions: nblockx, nblocky, nblockz
The block sizes in the x,y,z directions: gr_tilesizex, gr_tilesizex, gr_tilesizex

The resolution in the x-direction can be computed using nblockx * gr_tilesizex. An example using the integer runtime parameters to unblock a dataset can be seen in our scripts. These integer settings are read by boxkit to simplify reconstructing a simulation.

Temperature

The temperature is stored in a non-dimensionalized form. This means that the stored temperature will always range from [0-1]. In studies where we vary the heater temperature, the temperature ranges from the liquid temperature, to the heater temperature. So, in each case, the heater tempeature is normalized to 1. Directly inputting the non-dimensionalized values to a neural network may be a problem, since a heater temperature of 80 degrees would appear the same as a heater temperature of 100 degrees. To resolve this, you should re-dimensionalize the temperature field. This is simple and can just be done by multiplting the non-dimensionalized temperature by the heater temperature:

temp = sim_file['temperature][:]
heater_temp = get_heater_temp(sim_file)
temp *= heater_temp

Once this is done, it should be safe to use. In the studies where the heater temperature is constant, the studies we vary the gravity or inlet velocity), this redimensionalization is unnecessary.

Pressure

Each of the simulation files stores the pressure gradient, not the actual pressure. This is because only the pressure gradient is used in the governing equations. The pressure is computed by solving a Poisson equation. We have noticed that the Poisson solver may not be sufficiently robust to be used on its own. In the numerical simulations, this is fine because its main purpose is to correct the velocities, not serve as a truly accurate model of pressure. In our experiments, we did not use the pressure, but we make note of it for future users who may be interested. It would be interesting to incorporate the pressure into models and test whether velocity predictions improve. The poisson solver will likely be improved in a future version of Flash-X.

Distance function

Each simulation includes a field dfun, which is a signed distance function to the nearest bubble interface. When a point is in the vapor phase, dfun > 0. When a point is in the liquid phase, dfun <= 0. This field can be used to get a mask for all liquid points, all vapor points, or points along the bubble interface. In the example, we include an example of how to compute the liquid-vapor interface using the same heavy-side function as the simulation. We use the distance function to generate a mask of bubble locations (I.e., points in the vapor phase.)

The Domain Boundary

The simulation data we provide does not include the boundary. For instance, f['temperature'][:, 0, 0] (row 0, column 0) is not indexing the heater. Instead, it is indexing the cell just above the heater. Similarly, f['temperature'][:, 0, 10] (row 0, column 10) is not indexing the left wall, it is indexing the cell to the right of the wall. If you want to explicitly account for boundaries in your model (perhaps for a physics-informed neural network), you must handle it implicitly, or extend the domain with the boundary info. In our experiments, we treat it implicitly and assume that the model will be able to capture the boundary info from the input history.

Pool boiling experiments have walls on the left and right sides, and an outlet at the top. Flow boiling experiments have an inlet on the left and an outlet on the right. The top is a no-slip wall. In both cases, the heater is always along the bottom of the domain. The domain boundary is fixed for every timestep: the heater temperature does not change, a wall is always a wall, and an outlet is always an outlet.

Steady-State

The Flash-X simulations take many iterations before they reach a quasi-steady state. This essentially means that the initial timesteps may not be physically “valid.” The BubbleML dataset includes these “unsteady” initial states. It is very reasonable (and probably best) to exclude these initial steps. In our experiments, we drop the first 30 timesteps for dataset discretized to 1 unit of non-dimensional time. We drop the first 300 timesteps for datasets discretized to 0.1 unit of non-dimensional time. Dropping more timesteps may be reasonable. Dropping more than 60 (or 600) is likely overkill.

Extending BubbleML

We provide a reproducibility capsule for running the simulations with Flash-X. This includes lab notebooks for running simulations. It also includes analysis scripts and the submissions files used to generate BubbleML.