HSC pipeline command


General info about HSC pipeline commands

  • ROOT
    Path to input data repository.
  • -h, –-help
    Show the help message.
  • -–calib=<directory for detrend data>
    Path to input calibration repository. If you don’t specify the directory, pipeline can proceed the analysis and it creates the data for detrend in $home/hsc[directory on which you put _mapper]/CALIB.
  • -–output=<directory for output data>
    Path to output data repository. If you specify rerun, you don’t need to set this option. This is useful when you want to store the output in a certain rerun to another directory.
  • -–rerun=<rerun name>
    Name of a certain analysis process. Basically, HSC pipeline performs analisys referring the data under <rerun name> directory and creates post-data under the same directory. If you perform data reduction with different parameters, it will refer as different rerun.
  • -c <config item>=<parameter value>, –-config <config item>=<parameter value>
    Config override option, e.g. -c foo=newfoo bar.baz=3. If you have many coonfig parameters to change, you can load config file which you have prepared using -C option.
  • -C <config file name>, –-configfile <config file name>
    Config override file option. You need to write one parameter in one line.
  • -L <log level> –-loglevel <log level>
    Specify the level of logging messages. Choices are DEBUG(ridiculous numbers of messages), INFO(basic information [default]), WARN(only warnings), FATAL(only complete pipeline failures)
  • –-debug
    Enable debugging output.
  • –-doraise
    Raise an exception on error. If you want to pick up an exception from log, use this option.
  • –-logdest LOGDEST
    Specily a file where log messages are copied (they are still appear on the terminal).
  • –-show [{config,data,tasks,run} [{config,data,tasks,run} ...]]
    Display the specified information. The most useful option is –show config. It is used to print the all config parameters on the terminatl (detailed info is Arguments and patameters of commands). This can be also used to extract only a specific keyword. E.g. in order to show only the things matching ’background‘, you can use –show config=*background*. Also useful is –show tasks. This can print the tasks to be used by the command you are running.
  • -j <number of process>, –-processes <number of process>
    Number of processes to use. This can be used to perform multiprocessing on 1 node.
  • -t <process timeout [sec]>, –-process-timeout <process timeout [sec]>
    Timeout for multiprocessing; maximum wall time in sec.
  • –-clobber-output
    Remove and re-create the output directory if it already exists. Because the process specified by this parameter ends before multiprocessing, ‘-j’ parameter can be also added when you execute the command.
  • -–clobber-config
    Parameter to make a backup of your config (files in <data_repo>/config/ will be moved; <foo> –> <foo>~1), and overwrite existing config file. Every time you run the pipeline commands, all config parameters are stroed in the repository. If you have changed something which would make your data inhomogeneous, the pipeline will refude to execute. However, you can run the command due to adding –clobber-config.
  • –-id <Key>=<parameter value> <Key>=<parameter value> ...
    Data ID that you want to run, e.g. –id visit=12345 ccd=1,2

Output data to different rerun or directory

For stack.py, there are some tips like shown below to output the coadd data to different rerun or directory.

  • Devide <rerun for data input> and <rerun for data output> by colon (:)

    # Input data from dith_16h_test directory, then output coadd data to dith_16h_output
    stack.py $home/hsc --rerun=dith_16h_test:dith_16h_output --id filter=HSC-I tract=0 --selectId visit=902798..902808:2 ccd=0..103
    
    # Usage:
    #   stack.py <directory for analysis> --rerun=<rerun for data input>:<rerun for data output>
    
  • Specify the full path of rerun for data input in <directory for analysis>, and rerun in <rerun for data output>

    # In case of input data from dith_16h_test, then output coadd data to rerun/dith_16h_test/rerun/dith_16h_output directory
    stack.py $home/hsc/rerun/dith_16h_test --rerun=dith_16h_output --id filter=HSC-I tract=0 --selectId visit=902798..902808:2 ccd=0..103
    
    # Usage:
    #   stack.py <rerun for data input> --rerun=<rerun for data output>
    #
    # !!NOTE!! In this case, new rerun will be created under $home/hsc/rerun/dith_16h_test/rerun not $home/hsc/rerun
    
  • Use –output

    # Input data from dith_16h_test directory, then output coadd data to dith_16h_output
    stack.py $home/hsc/rerun/dith_16h_test --output=$home/hsc/rerun/dith_16h_output --id filter=HSC-I tract=0 --selectId visit=902798..902808:2 ccd=0..103
    
    # Usage:
    #   stack.py <rerun for data input> --output=<rerun for data output (full path)>
    #
    # !!NOTE!! Specify the full path of rerun for data output
    

If you have already run stack.py, HSC pipeline uses warp-*.fits in /rerun/[rerun]/[filter]/[tract]/[patch] directory to combine data or detect objects, and then output coadd data to [directory for data output]/rerun. If you want to create warp data again, you need to run stack.py after move/rename the directory under existing deepCoadd derectory.

# Change directory name under deepCoadd
mv $home/hsc/rerun/dith_16h_test/deepCoadd/HSC-I $home/hsc/rerun/dith_16h_test/deepCoadd//tmp_HSC-I

Arguments and patameters of commands

You can check arguments and parameters from help or the config file. The following is the example.

# Show help
# -h is also used
mosaic.py --help

# Show the contents of config file
# Specify the directory on which _mapper file is next to mosaic.py
mosaic.py $home/HSC/ --show config

In the case of mosaic.py, the following settings are printed on the terminal.

positional arguments:
        ROOT                  path to input data repository, relative to
                $PIPE_INPUT_ROOT

optional arguments:
        -h, --help            show this help message and exit
        --calib RAWCALIB      path to input calibration repository, relative to
                          $PIPE_CALIB_ROOT
        --output RAWOUTPUT    path to output data repository (need not exist),
                          relative to $PIPE_OUTPUT_ROOT
        --rerun [INPUT:]OUTPUT
                          rerun name: sets OUTPUT to ROOT/rerun/OUTPUT;
                          optionally sets ROOT to ROOT/rerun/INPUT
        -c [NAME=VALUE [NAME=VALUE ...]], --config [NAME=VALUE [NAME=VALUE ...]]
                          config override(s), e.g. -c foo=newfoo bar.baz=3
        -C [CONFIGFILE [CONFIGFILE ...]], --configfile [CONFIGFILE [CONFIGFILE ...]]
                          config override file(s)
        -L LOGLEVEL, --loglevel LOGLEVEL
                          logging level
        -T [COMPONENT=LEVEL [COMPONENT=LEVEL ...]], --trace [COMPONENT=LEVEL [COMPONENT=LEVEL ...]]
                          trace level for component
        --debug               enable debugging output?
        --doraise             raise an exception on error (else log a message and
                          continue)?
        --logdest LOGDEST     logging destination
        --show SHOW [SHOW ...]
                          display the specified information to stdout and quit
                          (unless run is specified).
        -j PROCESSES, --processes PROCESSES
                          Number of processes to use
        -t PROCESSTIMEOUT, --process-timeout PROCESSTIMEOUT
                          Timeout for multiprocessing; maximum wall time (sec)
        --clobber-output      remove and re-create the output directory if it
                          already exists (safe with -j, but not all other forms
                          of parallel execution)
        --clobber-config      backup and then overwrite existing config files
                          instead of checking them (safe with -j, but not all
                          other forms of parallel execution)
        --no-backup-config    Don't copy config and eups versions to file~N backup.
        --id [KEY=VALUE1[^VALUE2[^VALUE3...] [KEY=VALUE1[^VALUE2[^VALUE3...] ...]]
                          data ID, with raw CCD keys + tract

Notes:
        * the --rerun option allows the input and output directories to be specified
          simultaneously, and relative to the same root.
        * --config, --configfile, --id, --trace and @file may appear multiple times;
          all values are used, in order left to right
        * @file reads command-line options from the specified file:
        * data may be distributed among multiple lines (e.g. one option per line)
        * data after # is treated as a comment and ignored
        * blank lines and lines starting with # are ignored
        * To specify multiple values for an option, do not use = after the option name:
                * wrong: --configfile=foo bar
        * right: --configfile foo bar

Command editting

If you want to edit the contents of a certain command, you can find the appropriate python script using the following way. For example, you will rewrite a comtent of makeDiscreteSkyMap.py.

  1. Confirm the directory storing makeDiscreteSkyMap.py

    which makeDiscreteSkyMap.py
    
    # Result
    # >> $HOME/hscpipe/3.7.3/Linux64/pipe_tasks/HSC-3.9.0/bin/makeDiscreteSkyMap.py
    
  2. Check the content of makeDiscreteSkyMap.py

    cat $HOME/hscpipe/3.7.3/Linux64/pipe_tasks/HSC-3.9.0/bin/makeDiscreteSkyMap.py
    
    # Content of makeDiscreteSkyMap.py
    """
    #!/usr/bin/env python
    #
    # LSST Data Management System
    # Copyright 2008, 2009, 2010 LSST Corporation.
    #
    # This product includes software developed by the
    # LSST Project (http://www.lsst.org/).
    #
    # This program is free software: you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation, either version 3 of the License, or
    # (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.    See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the LSST License Statement and
    # the GNU General Public License along with this program.  If not,
    # see <http://www.lsstcorp.org/LegalNotices/>.
    #
    from lsst.pipe.tasks.makeDiscreteSkyMap import MakeDiscreteSkyMapTask
    
    MakeDiscreteSkyMapTask.parseAndRun()
    """
    
  3. Check which module is read by the command
    In case of makeDiscreteSkyMap.py, module ‘lsst.pipe.tasks.makeDiscreteSkyMap’ invokes class ‘MakeDiscreteSkyMapTask’.
  4. Move to directory which includes the module.
    The path to a directory storing module has been set. The path is explicitly defined in the module name; 2 words following ‘lsst’. In case of above, the path to directory for the module is $PIPE_TASKS_DIR. The python module and class are stored in $PIPE_TASKS_DIR > phthon > lsst > pipe > tasks.
    cd $PIPE_TASKS_DIR/python/lsst/pipe/tasks
    
  5. Edit script used in the command.
    For makeDiscreteSkyMap.py, class ‘MakeDiscreteSkyMapTask’ is used.
    vi makeDiscreteSkyMap.py
    

If you cannot edit the source file, please follow the below steps. Copy /opt/hscpipe/3.7.3/Linux64/pipe_tasks/HSC-3.9.0/python/lsst/pipe/tasks/calibrate.py like below

$ mkdir -p ~/opt/hscpipe/3.7.3/Linux64/pipe_tasks
$ cp -r /opt/hscpipe/3.7.3/Linux64/pipe_tasks/HSC-3.9.0 ~/opt/hscpipe/3.7.3/Linux64/pipe_tasks/
#
# Edit source code
$ setup -jr ~/opt/hscpipe/3.7.3/Linux64/pipe_tasks/HSC-3.9.0

Please perform the command ‘setup -jr ...’ every time you execute the file, then your setting is reflected. Confirm the environment;

$ eups list
  pipe_tasks            LOCAL:/home/<User name>/opt/hscpipe/3.7.3/Linux64/pipe_tasks/HSC-3.9.0   setup