Star catalog formats

Star catalogs can be loaded from well-known formats (VOTable, CSV, etc.) using the STIL data loader, or they can use a binary format tailor-made for Gaia Sky. In general, the binary format loads much faster and is more compact. That’s why we use it for our big level-of-detail star catalogs based on Gaia data.

Contents

Star catalog formats
- Binary format specification
  - Metadata file
    - Version 0
    - Version 1
  - Star particle files
- LOD catalog processing

This section discusses the level-of-detail (LOD) datasets (from Gaia DR2 on) where not all data fits into the CPU memory (RAM) and especially the GPU memory (VRAM).

In order to solve the issue, Gaia Sky implements a LOD structure based on the spatial distribution of stars into an octree. The culling of the octree is determined using a draw distance setting, called $θ$ . $θ$ is actually the minimum visual solid angle (as seen from the camera) of an octant for it to be observed and its stars to be rendered. Larger $θ$ values lead to less octants being observed, and smaller $θ$ values lead to more octants being observed.

Balancing the loading of data depends on several parameters:

The maximum java heap memory (set to 4 Gb by default), let’s call it maxheap.
The available graphics memory (VRAM, video ram). It depends on your graphics card. Let’s call it VRAM.
The draw distance setting , $θ$ .
The maximum number of loaded stars, $ν$ . This is in the configuration file ($GS_CONFIG/config.yaml) under the key scene::octree::maxstars. The default value balances the maximum heap memory space and the default data set.

So basically, a low $θ$ (below 50-60 degrees) means lots of observed octants and lots of stars. Setting $θ$ very low causes Gaia Sky to try to load lots of data, eventually overflowing the heap space and creating an OutOfMemoryError. To mitigate that, one can also increase the maximum heap space.

Finally, there is the maximum number of loaded stars, $ν$ . This is a number is set according to the maxheap setting. When the number of loaded stars is larger than $θ$ , the loaded octants that have been unobserved for the longest time will be unloaded and their memory structures will be freed (both in GPU and CPU). This poses a problem if the draw distance setting is set so that the observed octants at a single moment contain more stars than than $θ$ . That is why high values for $θ$ are recommended. Usually, values between 60 and 80 are fine, depending on the dataset and the machine.

$θ$: Draw distance, minimum visual solid angle for octants to be rendered
$ν$: Maximum number of stars in memory at a given time

Binary format specification

Gaia catalogs contain typically hundreds of millions of stars. They are too large to fit in your neighbor’s consumer GPU. In order to be able to represent such catalogs in real time, Gaia Sky implements a level-of-detail algorithm backed by an octree. The data format of all level-of-detail catalogs is a custom binary format to make it more compact and fast to load. This binary format can, however, also be used for smaller star catalogs. This section contains its specification.

There are two types of files: the metadata (metadata.bin) and the particle files (particles_xxxxxxx.bin). The metadata file contains all the nodes of the octree (called octants). Each octant points to a particle file, containing its particles. The number in the particle file name is the identifier of the octant. Additionally, the particle files can also be used for standalone smaller star catalogs.

All binary numbers in the metadata and particles files use big-endian byte ordering.

The distance units are internal units.

Metadata file

The metadata reader is implemented here. The metadata file contains the information of the octants of the octree. The metadata format has currently two possible versions, 0 and 1, which are automatically detected by Gaia Sky.

Version 0

Version 0 (legacy) does not contain its version number in the file itself. Instead, if the first four bytes interpreted as an integer are zero or positive, version 0 is assumed. All numbers are stored BE (big-endian). The format is the following.

1 single-precision integer (32-bit) – number of octants in the file
For each octant:
- 1 single-precision integer (32-bit) – Page ID - ID of current octant
- 3 single-precision float (32-bit * 3) – X, Y, Z cartesian coordinates in internal units
- 1 single-precision float (32-bit) – Octant half-size in X
- 1 single-precision float (32-bit) – Octant half-size in Y
- 1 single-precision float (32-bit) – Octant half-size in Z
- 8 single-precision integer (32-bit * 8) – IDs of the 8 children (-1 if no child)
- 1 single-precision integer (32-bit) – Level of octant (depth)
- 1 single-precision integer (32-bit) – Cumulative number of stars in this node and its descendants
- 1 single-precision integer (32-bit) – Number of stars in this node
- 1 single-precision integer (32-bit) – Number of children nodes

Version 1

Since 3.0.4

Version 1 was introduced in Gaia Sky 3.0.4, and starts with a negative integer in the first four bytes, typically -1. Then comes the version number. The main difference with the legacy version is that the page IDs are encoded with a 64-bit integer instead of 32.

1 single-precision integer (32-bit) – special token number -1, signaling the presence of a version number
1 single-precision integer (32-bit) – version number (1 in this case)
1 single-precision integer (32-bit) – number of octants in the file
For each octant:
- 1 double-precision integer (64-bit) – Page ID - ID of current octant
- 3 single-precision float (32-bit * 3) – X, Y, Z cartesian coordinates in internal units
- 1 single-precision float (32-bit) – Octant half-size in X
- 1 single-precision float (32-bit) – Octant half-size in Y
- 1 single-precision float (32-bit) – Octant half-size in Z
- 8 double-precision integer (64-bit * 8) – IDs of the 8 children (-1 if no child)
- 1 single-precision integer (32-bit) – Level of octant (depth)
- 1 single-precision integer (32-bit) – Cumulative number of stars in this node and its descendants
- 1 single-precision integer (32-bit) – Number of stars in this node
- 1 single-precision integer (32-bit) – Number of children nodes

Star particle files

A particle file contains the information of a number of stars. These can be the stars belonging to a particular octant in a LOD octree, or all the stars in a particular star catalog.

The class in charge of loading and writing binary star particle files is the BinaryDataProvider.

The binary readers/writers are implemented in the following files:

Version 0 was used in DR2, version 1 was used mainly in the first batch of eDR3. Version 2 is used in the second batch of eDR3 and future DRs. Versions 0 and 1 are not annotated, so they are detected using the file name. Starting from version 2, the version number is in the file header, using a special token (negative integer).

Version 0

The version 0 is specified below. It contains a header with the number of stars and then a bunch of data for each star. It contains a 3-integer set which is the Tycho identifier, mainly for compatibility with TGAS. All numbers are stored BE (big-endian).

1 single-precision integer (32-bit) – number of stars in the file
For each star:
- 3 double-precision floats (64-bit * 3) – X, Y, Z cartesian coordinates in internal units
- 3 double-precision floats (64-bit * 3) – Vx, Vy, Vz - cartesian velocity vector in internal units per year
- 3 double-precision floats (64-bit * 3) – mualpha, mudelta, radvel - proper motion
- 4 single-precision floats (32-bit * 4) – appmag, absmag, color, size - Magnitudes, colors (encoded), and size (a derived quantity, for rendering)
- 1 single-precision integer (32-bit) – HIP number (if any, otherwise negative)
- 3 single-precision integer (32-bit * 3) – Tycho identifiers
- 1 double-precision integer (64-bit) – Gaia SourceID
- 1 single-precision integer (32-bit) – namelen -> Length of name
- namelen * char (16-bit * namelen) – Characters of the star name, where each character is encoded with UTF-16

Version 1

Version 1 is the same as version 0 but without the Tycho identifiers.

1 single-precision integer (32-bit) – number of stars in the file
For each star:
- 3 double-precision floats (64-bit * 3) – X, Y, Z cartesian coordinates in internal units
- 3 double-precision floats (64-bit * 3) – Vx, Vy, Vz - cartesian velocity vector in internal units per year
- 3 double-precision floats (64-bit * 3) – mualpha, mudelta, radvel - proper motion
- 4 single-precision floats (32-bit * 4) – appmag, absmag, color, size - Magnitudes, colors (encoded), and size (a derived quantity, for rendering)
- 1 single-precision integer (32-bit) – HIP number (if any, otherwise negative)
- 1 double-precision integer (64-bit) – Gaia SourceID
- 1 single-precision integer (32-bit) – namelen -> Length of name
- namelen * char (16-bit * namelen) – Characters of the star name, where each character is encoded with UTF-16

Since 3.0.2

Version 2

This version is much more compact, and it uses smaller data types when possible. The header contains a token integer (-1) marking the following version number, plus the number of stars. All numbers are stored BE (big-endian).

1 single-precision integer (32-bit) – special token number -1, signaling the presence of a version number
1 single-precision integer (32-bit) – version number (2 in this case)
1 single-precision integer (32-bit) – number of stars in the file
For each star:
- 3 double-precision floats (64-bit * 3) – X, Y, Z cartesian coordinates in internal units
- 3 single-precision floats (32-bit * 3) – Vx, Vy, Vz - cartesian velocity vector in internal units per year
- 3 single-precision floats (32-bit * 3) – mualpha, mudelta, radvel - proper motion
- 4 single-precision floats (32-bit * 4) – appmag, absmag, color, size - Magnitudes, colors (encoded), and size (a derived quantity, for rendering)
- 1 single-precision integer (32-bit) – HIP number (if any, otherwise negative)
- 1 double-precision integer (64-bit) – Gaia SourceID
- 1 single-precision integer (32-bit) – namelen -> Length of name
- namelen * char (16-bit * namelen) – Characters of the star name, where each character is encoded with UTF-16

The RGB color of stars uses 8 bits per channel in RGBA, and is encoded into a single float using the libgdx Color class.

Some discussion on memory issues and the streaming loader can be found here.

LOD catalog processing

All LOD catalogs are based on one of the Gaia data releases (DR2, DR3, etc.), and they also include the brighter stars from the Hipparcos catalog. The official Gaia-Hipparcos crossmatch is used to identify stars that are contained in both catalogs. In this case, the parallax is taken from the source that has the smaller parallax error. The rest of de data is taken from the Gaia catalog, but some attributes are merged (for instance, the final star contains both the HIP number and the Gaia source id).

The LOD catalogs are generated using a program written in Rust, called gaiasky-catgen. The source code can be found in this repository. In the LOD generation process, each star is processed individually. The catalog is filtered according to the input parameters, and some corrections are applied to star attributes.

Catalogs

For each Gaia data release, we offer a selection of subsets which contain different cuts of the whole data. These subsets are typically computed using the criterion of parallax relative error, which measures how large the error in parallax is with respect to the parallax value. We define a cut-off value, $s$ , which is the maximum percentage of the parallax allowed for the errors, in [0,1]:

e r r_{p l l x} < p l l x * s

where $e r r_{p l l x}$ is the parallax error, and $p l l x$ is the parallax for that source. As we mentioned above, $s$ is the cut-off percentage value, in [0,1]. The cut-off value is usually split into two different values, one for bright stars and one for faint stars. What are bright stars and what are faint stars?

Bright – $G_{m a g} < 13.1$
Faint – $G_{m a g} >= 13.1$

So, for example, the DR3 default catalog contains all stars up to 20%/1.5% parallax relative error for bright/faint stars. This means that all bright stars where the error is not larger than 20% of the parallax are included, and all faint stars where the error is not larger than 1.5% of the parallax are also included.

See all the LOD catalogs we offer in our data server:

Current catalogs (DR3).

Distances

In most catalogs, distances are derived from parallaxes, using the formula

d [p c] = 1000 / p l l x [m a s] .

All parallaxes are zero-point corrected as instructed in the official DR documentation before being converted to distances. Sometimes, some parallaxes are negative. In this case, Gaia Sky opts for keeping the star and assigning it a default parallax of 0.04 mas, which corresponds to 25 kpc instead of discarding it.

However, some catalogs use distances determined elsewhere by different methods and injected into the generation process as additional columns. This is the case for the geometric (Bayesian) distances and the photometric distances catalogs.

Magnitude/color corrections

Extinction and reddening factors are applied to star magnitudes and colors, respectively.

When the extinction value $A_{g}$ is present in the catalog or in an additional column, it is applied directly to the magnitude. Otherwise, we default to the following analytical extinction,

A_{g} = min (3.2, \frac{150}{| sin (b) |} * 5.9 e - 4),

where $b$ is the galactic longitude of the star.

Similarly, we apply the reddening value $E_{B P - R P}$ when it is in the catalog or in an additional column. Otherwise, we fall back to the following analytical determination, based on the extinction:

E_{B P - R P} = min (1.6, A_{g} * 2.9 e - 4) .