Machine environment

You can find some useful information about machine environment on which you will execute HSC pipeline.


HSC data analysis machine for open use(hanaco)

We have a machine for HSC open user to analyze HSC data. The basic specification is shown below;

  Spec
CPU x86_64
Number of Core 32
Memory 256 GB
HDD 36 TB x2

0. Application

If you want to use hanaco, please submit the application via Application Form. Your account information will be sent by e-mail within 3 working days. If not, please contact the administrator (yukie.oishi@nao.ac.jp).

1. Login to hanaco

Please refer to the following way to login to hanaco. If you access to hanaco from outside of NAOJ, you need to register yourself on VPN.

ssh -X -A hoge@hanaco.ana.nao.ac.jp

2. Operating precautions for hanaco

  • Basic Concept
    hanaco is installed for users who cannot construct analysis environment for HSC data. If you don’t have such environment, you can use hanaco for data reduction and image/catalog creation via HSC pipeline. After catalog creation, please move these output data to your own machine and use it to analyze them. Basically you can use hanaco for 6 months. After this period, your data might be deleted (※1). So please prepare for data strage to backup your data. In case you cannot do that, you can use machines for common use in NAOJ.
    (※1 We will inform you a few weeks before the date of data deletion or disk cleanup activity.)
  • Expiration Date
    As mentioned above, hanaco is valid for 6 months. You can use longer in case that there is enough disk space on hanaco or you have not completed your analysis. Please contact yukie.oishi@nao.ac.jp if you will not finish data reduction in 6 months.
  • Working Directory
    You have to perform all analysis via HSC pipeline under /data/<User Name>/ or /data2/<User Name>/, not home directory.
    # Making your working directory in /data
    mkdir /data/<User Name>/
    
    If you want to check the available disk space, you can use the following command.
    df -h
    # Output
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sdb1        63G   45G   16G  75% /
    tmpfs           127G   33M  127G   1% /dev/shm
    /dev/sdb3        16G   44M   16G   1% /work
    /dev/sdc1        33T   31T  1.9T  95% /data
    /dev/sda1        28T  1.1T   27T   4% /data2
    

3. Reminder for HSC pipeline on hanaco

HSC pipeline ver 4.0.5, 4.0.4, 4.0.2, 4.0.1, 3.8.5, and 3.8.1 are installed under /data/ana/hscpipe. The latest one is 4.0.4. 4.0.2 or later supports i2 filter analysis and 4.0.5 or later supports r2 filter.

Warning

If you analyze the data taken after Jun. 2016, you need to use 4.0.4.

If you want to check which version of pipeline is loaded, please refer to .bashrc in your home directory. The following example shows how to change loading pipeline version from 4.0.2 to 4.0.5;

# .bashrc
...
# source /data/ana/hscpipe/4.0.2/bashrc <- comment out
source /data/ana/hscpipe/4.0.5/bashrc

You need to perform only the bolow command to set up HSC pipeline environment.

# Setup HSC pipeline
setup-hscpipe

# Setup astrometry catalog
setup astrometry_net_data sdss-dr9-fink-v5b

In case that you make your user-defined commands, please copy a directory including the original source to your work directory and edit it. The procedure is shown below;

# Copy a directory including target source to your working directory
# Original directory /data/ana/hscpipe/3.8.5/Linux64/pipe_tasks/HSC-3.9.0
mkdir -p /data/<User Name>/ana/hscpipe/3.8.5/Linux64/pipe_tasks
cp -r /data/ana/hscpipe/3.8.5/Linux64/pipe_tasks/HSC-3.9.0 /data/<User Name>/ana/hscpipe/3.8.1/Linux64/pipe_tasks

## Editting source codes ##

# Setup and reflect the user-defined commands and parameters
# Execute the following command every time you use
setup -jr /data/hoge/ana/hscpipe/3.8.5/Linux64/pipe_tasks/HSC-3.9.0

The open-use computer system maintained by Astronomy Data Center (ADC)/NAOJ

You can also use the open-use PCs in ADC/NAOJ for HSC data reduction. Some tips to analyze data with the PCs are shown below. Please read User’s Guide for basic rules to use the open-use PCs in ADC/NAOJ.

0. Preparation

You need to set up your account for open-use PCs in ADC/NAOJ. You can apply from User application form

1. Work Disk

We provide 16 TB capacity hard disk named /mfst[01-04][a-e] for your work. Please make your own directory under this disk and perform data reduction in it. You can check the amount of disk space or status here.

# Example
mkdir /mfst01b/<User Name>

Warning

In order to perform analysis via HSC pipeline, large capacity of disk is required. Please check the amount of disk space before data reduction here and then execute on the one with enough space.

2. Installation of HSC pipeline

OS Red Hat 6 is used in open-use PCs. Binary package of HSC pipeline for CentOS 6 (HSC pipeline installation) should be installed.

However, you cannot access to Binary Distribution server from open-use PCs. So please download the package and catalog file for astrometory with the following way;

  1. Download to your own PC, then copy them to open-use PC by scp command, or
  2. Download directly to open-use PC via your own PC.
# For Case 2, you can use the following command;
# wget
wget https://[pipeline URL] --no-check-c -O - | ssh [User name]@anam01.ana.nao.ac.jp:/mfst01b/hoge/ 'cat > pipe.tar.xz'

# curl
curl https://[pipeline URL] --insecure https://[pipeline URL] | ssh [User name]@anam01.ana.nao.ac.jp:/mfst01b/hoge/ 'cat > pipe.tar.xz'

3. Server and Queue information for HSC pipeline execution

On open-use PCs, the processing will finish in a relatively short time when you use bapm server q16m and /tmp[A,B] resion. /tmp[A,B] is referred from analysis server (anam) as /wbm[01-06][a,b] with “Read-Only”. Please put the files on /tmp[A,B], then execute batch process with specifing the files under /tmp[A,B] from your working directory using PBS script. The detailed specification of configuration or architecture is described here

4. PBS Batch Processing

PBS batch processing is built in open-use system. Using q16m queue which has maximum number of core, batch processing is executed with 16 cores per node. You need to prepare PBS batch script to execute batch processing in HSC pipeline.

For HSC pipeline, batch processing is available for reduceBias.py, reduceDark.py, reduceFlat.py, reduceFrames.py, stack.py, multiBand.py, and forcedCcd.py. The following example is PBS batch script for reduceFrames.py.

# Preparing batch script using dry-run
reduceFrames.py /mfst01b/<User Name>/hsc --calib=/mfst01b/<User Name>/hsc/CALIB --output=/tmpB/<User Name> --id filter=HSC-I visit=902798..902808:2 --config processCcd.isr.doFringe=False --dry-run --clobber-config

# Parameters:
#   --output         :In the example above, specifing /tmp[A,B] region to finish process in a shot time. Instead of --output, --rerun=dith_16h_test is also used, but slower.
#   --dry-run     :Dry run to create PBS script.
#   --clobber-config :Execute command without using same rerun information.

When you add –dry-run to the command, the result of batch script will be output under /var/tmp/. You can use the script after copying this result to your own directory and edit it.

# Copy (or move) the --dry-run result to your woking directory
cp /var/tmp/tmph0gE /mfst01b/<User Name>/work/


# Edit tmph0gE file (PBS batch sfcript).
# In the batch script, there are some default comments.
# Please delete them all, then add below comments.
:
:
#!/bin/bash
#PBS -m abe
#PBS -q q16m
#PBS -l select=1:ncpus=16:mpiprocs=16:mem=20gb
#PBS -l walltime=336:00:00
#PBS -M hoge@nao.ac.jp


# To make log file for tracking batch process,
# please describe below line at the end of the above PBS commens.
:
:
{

        (pipeline commands)

} &> /mfst01b/hoge/work/tmph0gE.log

Please refer to Data Analysis System User’s Guide for detailed PBS option.

Note that;

  • Specify queue to q16m with -q q16m.
  • Set usage of 1 node, 16 cores by -l select=1:ncpus=16:mpiprocs=16:mem=20gb.
  • Set maximun of actual time in case that a job is in a run state to max by -l walltime=336:00:00

After preparing PBS batch script, run the following commands to perform it.

# Run PBS batch script
qsub -V /mfst01b/hoge/work/tmphOgE

The progress of this script is logged on tmphOgE.log, and you can check it by the following command;

# Output appended data as the file grows
tail -f tmphOgE.log

5. Estimated required time with open-use PC

The estimated time using the above batch processing is shown (in case of -l select=1:ncpus=16:mpiprocs=16:mem=4gb);

  • reduceBias.py      :17 minutes for 5 visit
  • reduceDark.py     :4 mimutes for 1 visit
  • reduceFlat.py      :37 minutes for 8 visit
  • reduceFrames.py    :106 minutes for 6 visit
  • makeDiscreteSkyMap.py :a few minutes
  • mosaic.py        :a few minutes for 6 visit
  • stack.py         :311 minutes for 6 visit

Batch Processing other than PBS

There are some batch systems other than PBS. Though the default system is PBS in HSC pipeline, you can set other ones by specify it in option command. Please use the best system operating on your machine.

# In case of use of reduceFrames.py on SLURM.
reduceFrames.py ~/hsc --calib=~/hsc/CALIB --rerun=dith_16h_test --id filter=HSC-I visit=902798..902808:2 --config processCcd.isr.doFringe=False --batch-type=slurm

# Parameter:
#   --batch-type :Specify batch system. You can select from {slurm,pbs, or smp}.