Coaddition and Reference Pipeline

Note: this documentation file may not be completely up to date.

Besides the regular pipeline that processes new images, we also have deep-coadd making and reference making pipelines.

Coaddition Pipeline

To make a deeper coadd image (e.g., a weekly coadd), we must first collect all the images. This can be done using something like the Image.query_images() function. It essentially wraps around a bunch of SQL filters to get at the required images.

from models.base import SmartSession
from models.image import Image
stmt = Image.query_images(target='field_034', section_id='N1', filter='R', min_mjd=59000, max_mjd=59007, ...)
with SmartSession() as session:
    images = session.scalars(stmt).all()

Notice that the image_query() does not return Image objects, but a SQLAlchemy statement. You can append additional filters to the statement before executing it. Use the images you’ve gotten (in any way you like) to make a coadd image. Make sure each image has its extraction products loaded before starting the coaddition.

from pipeline.coaddition import CoaddPipeline

pipe = CoaddPipeline(...)  # use kwargs to override the config parameters
coadd_image = pipe.run(images)  # the output image would also have extraction products generated by the pipeline

Another option to use the coaddition pipeline is to provide a set of named parameters, specifying the place in the sky, filter, and optional time range. Note that if the time range is not given, the end time is assumed to be the current time, and the start time is assumed to be 7 days before the end time (this is a configurable parameter of the coadd pipeline).

coadd_image = pipe.run(ra=123.4, dec=56.7, filter='g', min_mjd=59000, max_mjd=59007, ...)

Currently, the coadd pipeline only runs extraction on the coadd image (i.e., makes source lists, PSF, background, wcs and zp). However, it may be possible to add the other steps (detection, measuring, etc). Note that in such a case two things need to happen:

The coadd pipeline will have to be given a reference set name, in the subtraction parameters (see below).
The coadd pipeline needs to be able to run, and save the coadd image + extraction products, so it can still be used internally by the reference making pipeline.

Reference Pipeline

The reference making pipeline builds on top of the coadd pipeline but also produces a Reference object, and will place the Provenance of the new reference in a RefSet object.

There are two sets of parameters (and code versions!) that determine the uniqueness of a reference set:

The data production parameters and code versions that go into producing the individual images, their products, the coadd image made from them, and the extraction products of the coadd image. Together, these form the upstream_hash of the RefSet (this is done by hashing together the coadd image and its products’ provenance hashes). Because those provenances include the individual images’ provenances, any change you make along the data production pipeline (including individual image processing by the regular pipeline or coaddition and extraction by the coaddition pipeline) will change the upstream_hash value. Each RefSet has a unique upstream_hash value. This is because those provenances (of the coadd) will go into the upstreams of the subtraction image. To make sure the output provenance of the subtraction is completely defined by the choice of parameters (in this case, the choice of the ref-set name) then the RefSet must be one-to-one with the upstream_hash.
The parameters (and code) going into the production of the Reference object itself. This includes mostly the choice of which images to pick for the coaddition. This defines the provenances of the Reference object. There could be multiple Reference provenances on a single RefSet, so long as they all have the same upstream provenances (which will go into the upstream_hash and into the subtraction upstreams). It is possible to have a ref-set use one set of search criteria for images and if that fails, use a second set, usually with more relaxed limits on which images to pick. For a given place in the sky, you would try to make a reference using set 1, and if that fails, try set 2. There should always be exactly one reference for each place in the sky for each ref-set. Once a reference is successfully made, it is always the one that is loaded (unless it is marked as “bad”, which we need to think about what would that mean if we changed a reference mid-survey). The goal is to prevent changing of the reference in the middle of a survey, but allow the same ref-set to account for places where the images have lower quality. In principle, there could be many provenances (for the reference objects) appended to each RefSet, but it is not recommended to add more than one or two.

To produce a reference, use the RefMaker object:

from pipeline.ref_maker import RefMaker

maker = RefMaker(maker={'name': 'new_refset', 'instruments': ['PTF']}, ...)  # use kwargs to override config parameters
ref = maker.run(ra=123.4, dec=56.7, filter='g')

Note that we can specify the location in the sky using target/section ID as well. The ref-maker will check if a RefSet already exists with that name, and will try to append the referencing Provenance, if it is not already there. You can disable appending new provenances to the ref-set by setting allow_append=False on the ref-maker.

To produce the referencing provenance, the ref-maker will first create/load the provenances for the individual images, including the preprocessing and extraction steps, and the coaddition and extraction on the coadd steps. These provenances are used as the upstream of the reference provenance, and form the upstream_hash. If the ref-maker has parameters that are inconsistent with the upstream_hash of an existing ref-set with the same name, it will raise an error.

Note that even though the ref-maker has the parameters (and even a full Pipeline object) it does not load exposures, and it does not run preprocessing and extraction to produce the individual images. Those must already exist on the DB, and must also have extraction products on the DB, all with the correct provenances. On the other hand, the ref-maker can run the coaddition pipeline internally, if it doesn’t find the coadd image. If a coadd image and its products already exist with the correct provenance, they are loaded and re-used to produce a reference. If a coadd and a reference already exist, they will be loaded directly. Note that a Reference object’s provenance contains the parameters used to choose which images will be entered into the coaddition, but it does not name the RefSet it belongs to. Thus, multiple ref-sets can use the same Reference and the same underlying coadd image.