3.4. Data Containers
A third, and very important part of the AstroData core package is the data
container. We have chosen to extend Astropy’s NDData
with our own
requirements, particularly lazy-loading of data using by opening the FITS files
in read-only, memory-mapping mode, and exploiting the windowing capability of
astropy.io.fits
(using section
) to reduce our memory requirements, which
becomes important when reducing data (e.g., stacking).
We’ll describe here how we depart from NDData
, and how do we integrate the
data containers with the rest of the package. Please refer to NDData
for the
full interface.
Our main data container is astrodata.NDAstroData
. Fundamentally, it is
a derivative of astropy.nddata.NDData
, plus a number of mixins to add
functionality:
class NDAstroData(AstroDataMixin, NDArithmeticMixin, NDSlicingMixin, NDData):
...
This allows us out of the box to have proper arithmetic with error propagation, and slicing the data with the array syntax.
Our first customization is NDAstroData.__init__
. It relies mostly on the
upstream initialization, but customizes it because our class is initialized
with lazy-loaded data wrapped around a custom class
(astrodata.fits.FitsLazyLoadable
) that mimics a astropy.io.fits
HDU
instance just enough to play along with NDData
’s initialization code.
FitsLazyLoadable
is an integral part of our memory-mapping scheme, and
among other things it will scale data on the fly, as memory-mapped FITS data
can only be read unscaled. Our NDAstroData redefines the properties data
,
uncertainty
, and mask
, in two ways:
To deal with the fact that our class is storing
FitsLazyLoadable
instances, not arrays, asNDData
would expect. This is to keep data out of memory as long as possible.To replace lazy-loaded data with a real in-memory array, under certain conditions (e.g., if the data is modified, as we won’t apply the changes to the original file!)
Our obsession with lazy-loading and discarding data is directed to reduce memory fragmentation as much as possible. This is a real problem that can hit applications dealing with large arrays, particularly when using Python. Given the choice to optimize for speed or for memory consumption, we’ve chosen the latter, which is the more pressing issue.
We’ve added another new property, window
, that can be used to
explicitly exploit the astropy.io.fits
’s section
property, to (again)
avoid loading unneeded data to memory. This property returns an instance of
NDWindowing
which, when sliced, in turn produces an instance of
NDWindowingAstroData
, itself a proxy of NDAstroData
. This scheme may
seem complex, but it was deemed the easiest and cleanest way to achieve the
result that we were looking for.
The base NDAstroData
class provides the memory-mapping functionality,
with other important behaviors added by the AstroDataMixin
, which can
be used with other NDData
-like classes (such as Spectrum1D
) to add
additional convenience.
One addition is the variance
property, which allows direct access and
setting of the data’s uncertainty, without the user needing to explicitly wrap
it as an NDUncertainty
object. Internally, the variance is stored as an
ADVarianceUncertainty
object, which is subclassed from Astropy’s standard
VarianceUncertainty
class with the addition of a check for negative values
whenever the array is accessed.
NDAstroDataMixin
also changes the default method of combining the mask
attributes during arithmetic operations from logical_or
to bitwise_or
,
since the individual bits in the mask have separate meanings.
The way slicing affects the wcs
is also changed since DRAGONS regularly
uses the callable nature of gWCS
objects and this is broken by the standard
slicing method.
Finally, the additional image planes and tables stored in the meta
dict
are exposed as attributes of the NDAstroData
object, and any image planes
that have the same shape as the parent NDAstroData
object will be handled
by NDWindowingAstroData
. Sections will be ignored when accessing image
planes with a different shape, as well as tables.
Note
We expect to make changes to NDAstroData
in future releases. In particular,
we plan to make use of the unit
attribute provided by the
NDData
class and increase the use of memory-mapping by default. These
changes mostly represent increased functionality and we anticipate a high
(and possibly full) degree of backward compatibility.