This document lists some useful examples about using MIDP.

1. Preparation

1.1. Package installation

The used python version is 3.7. One can install this package locally by cloning the repo and running

python setup.py install --user

or install it by pip

pip install git+https://github.com/YuanYuYuan/MIDP

1.2. Download sample dataset

In the following examples, we use the PDDCA dataset as the sample dataset, and name it as data.

The structure of PDDCA dataset is like the following and in NRRD data format(.nrrd).

data
├── 0522c0002
│   ├── img.nrrd
│   └── structures
│       ├── Brainstem.nrrd
│       ├── Chiasm.nrrd
│       ├── Mandible.nrrd
│       ├── OpticNerve_L.nrrd
│       ├── OpticNerve_R.nrrd
│       ├── Parotid_L.nrrd
│       ├── Parotid_R.nrrd
│       ├── Submandibular_L.nrrd
│       └── Submandibular_R.nrrd
In here we change the default "BrainStem" to "Brainstem".

2. Tools

2.1. Data conversion

2.1.1. Resample the NRRD into consistent spacing

In consideration of fast model training under some consistent spacing, we’d better resample the dataset and store it as a new fixed data.
Run the following command to convert the dataset data into new dataset in spacing 1, and store them at the directory resampeld_data.

./resample_nrrd_dataset.py --data data --spacing 1 --output-dir resampled_data

Although this program runs with multi processes, it might still take several minutes to finish it.

2.1.2. NRRD → NIfTI

The following one demos how to transfer the NRRD dataset to NIfTI dataset

./nrrd2nifti.py --config configs/nrrd2nifti.yaml

with a sample configuration configs/nrrd2nifti.yaml.

configs/nrrd2nifti.yaml
# Input directory containing NRRD data
data_dir: ./data

# Output directory storing NIfTI data
output_dir: ./nifti

# Unit: mm
spacing: 1

# TODO: correct ROIs to classes
# Map each structure to specified value
# For PDDCA dataset, there're
#   Mandible        : 1
#   Brainstem       : 2
#   Parotid_L       : 3
#   Parotid_R       : 4
#   Submandibular_L : 5
#   Submandibular_R : 6
#   OpticNerve_L    : 7
#   OpticNerve_R    : 8
#   Chiasm          : 9
# But note that some cases in PDDCA dataset may not contain all structures.
roi_map:
  Brainstem       : 1
  Parotid_L       : 2
  Parotid_R       : 3

After converted, the output in NIfTI format should be located at the output_dir (nifti in this example).

2.1.3. NIfTI → NRRD

When doing backward conversion from NIfTI to NRRD, it requires the information of the original dataset, so you should specify the source data directory(data in this example).
The following one demo how to use the program to convert the label from NIfTI to NRRD.

./nifti2nrrd.py --nrrd-dir data --nifti-dir nifti/labels --output-dir restored

2.2. For training

2.2.1. Generate data list for training

  • Generate data list by parsing loader.

    ./generate_data_list.py \
        --loader-config configs/parsing_loader.yaml \
        --output data_list.yaml
    [Example]
    data_list.yaml
    amount:
      test: 0
      total: 48
      train: 33
      valid: 15
    list:
      test: []
      train:
      - 0522c0001
      - 0522c0002
      - 0522c0009
      - 0522c0013
      - 0522c0014
      - 0522c0017
      - 0522c0070
      - 0522c0077
      - 0522c0079
      - 0522c0147
      - 0522c0161
      - 0522c0195
      - 0522c0248
      - 0522c0251
      - 0522c0253
      - 0522c0328
      - 0522c0329
      - 0522c0330
      - 0522c0427
      - 0522c0433
      - 0522c0455
      - 0522c0479
      - 0522c0576
      - 0522c0598
      - 0522c0659
      - 0522c0661
      - 0522c0667
      - 0522c0669
      - 0522c0708
      - 0522c0746
      - 0522c0788
      - 0522c0806
      - 0522c0878
      valid:
      - 0522c0003
      - 0522c0057
      - 0522c0081
      - 0522c0125
      - 0522c0132
      - 0522c0149
      - 0522c0159
      - 0522c0190
      - 0522c0226
      - 0522c0441
      - 0522c0457
      - 0522c0555
      - 0522c0727
      - 0522c0845
      - 0522c0857
    loader:
      name: ParsingLoader
      parser_config:
      - ROIs:
        - Brainstem
        - Parotid_L
        - Parotid_R
        data_dir: data
        name: PDDCAParser
        preprocess_image: false
  • Generate data list by nifti loader.

    ./generate_data_list.py \
        --loader-config configs/nifti_loader.yaml \
        --output data_list.yaml
    [Example]
    data_list.yaml
    amount:
      test: 0
      total: 2
      train: 1
      valid: 1
    list:
      test: []
      train:
      - 0522c0598
      valid:
      - 0522c0576
    loader:
      data_dir: nifti
      name: NIfTILoader
  • Generate data list by NRRD loader.

    ./generate_data_list.py \
        --loader-config configs/nrrd_loader.yaml \
        --output data_list.yaml
    [Example]
    data_list.yaml
    amount:
      test: 0
      total: 48
      train: 33
      valid: 15
    list:
      test: []
      train:
      - 0522c0002
      - 0522c0013
      - 0522c0014
      - 0522c0057
      - 0522c0081
      - 0522c0125
      - 0522c0132
      - 0522c0147
      - 0522c0149
      - 0522c0159
      - 0522c0161
      - 0522c0190
      - 0522c0195
      - 0522c0248
      - 0522c0251
      - 0522c0329
      - 0522c0330
      - 0522c0433
      - 0522c0441
      - 0522c0457
      - 0522c0479
      - 0522c0555
      - 0522c0576
      - 0522c0598
      - 0522c0661
      - 0522c0667
      - 0522c0669
      - 0522c0708
      - 0522c0727
      - 0522c0788
      - 0522c0806
      - 0522c0845
      - 0522c0878
      valid:
      - 0522c0001
      - 0522c0003
      - 0522c0009
      - 0522c0017
      - 0522c0070
      - 0522c0077
      - 0522c0079
      - 0522c0226
      - 0522c0253
      - 0522c0328
      - 0522c0427
      - 0522c0455
      - 0522c0659
      - 0522c0746
      - 0522c0857
    loader:
      data_dir: data
      name: NRRDLoader
      roi_map:
        Brainstem: 1
        Parotid_L: 2
        Parotid_R: 3
  • Generate data list by NRRD loader and toggle resampling.

    ./generate_data_list.py \
        --loader-config configs/nrrd_loader_resample.yaml \
        --output data_list.yaml
    [Example]
    data_list.yaml
    amount:
      test: 0
      total: 48
      train: 33
      valid: 15
    list:
      test: []
      train:
      - 0522c0001
      - 0522c0013
      - 0522c0014
      - 0522c0017
      - 0522c0057
      - 0522c0070
      - 0522c0077
      - 0522c0079
      - 0522c0081
      - 0522c0125
      - 0522c0132
      - 0522c0147
      - 0522c0149
      - 0522c0159
      - 0522c0195
      - 0522c0226
      - 0522c0248
      - 0522c0251
      - 0522c0330
      - 0522c0427
      - 0522c0433
      - 0522c0441
      - 0522c0455
      - 0522c0457
      - 0522c0479
      - 0522c0555
      - 0522c0576
      - 0522c0598
      - 0522c0667
      - 0522c0727
      - 0522c0788
      - 0522c0806
      - 0522c0857
      valid:
      - 0522c0002
      - 0522c0003
      - 0522c0009
      - 0522c0161
      - 0522c0190
      - 0522c0253
      - 0522c0328
      - 0522c0329
      - 0522c0659
      - 0522c0661
      - 0522c0669
      - 0522c0708
      - 0522c0746
      - 0522c0845
      - 0522c0878
    loader:
      data_dir: data
      name: NRRDLoader
      resample: true
      roi_map:
        Brainstem: 1
        Parotid_L: 2
        Parotid_R: 3
      spacing: 1
      test: false
  • Generate data list by NRRD loader and split it into training/validation/testing 3 parts with a ratio 6:2:2.

    ./generate_data_list.py \
    	--loader-config configs/nrrd_loader.yaml \
    	--output train_valid_test_list.yaml \
    	--split-ratio 0.6 0.2
    [Example]
    train_valid_test_list.yaml
    amount:
      test: 11
      total: 48
      train: 28
      valid: 9
    list:
      test:
      - 0522c0003
      - 0522c0070
      - 0522c0132
      - 0522c0147
      - 0522c0161
      - 0522c0427
      - 0522c0433
      - 0522c0441
      - 0522c0661
      - 0522c0845
      - 0522c0878
      train:
      - 0522c0001
      - 0522c0002
      - 0522c0009
      - 0522c0013
      - 0522c0014
      - 0522c0017
      - 0522c0077
      - 0522c0079
      - 0522c0081
      - 0522c0149
      - 0522c0190
      - 0522c0195
      - 0522c0248
      - 0522c0253
      - 0522c0328
      - 0522c0329
      - 0522c0330
      - 0522c0455
      - 0522c0457
      - 0522c0479
      - 0522c0555
      - 0522c0659
      - 0522c0667
      - 0522c0669
      - 0522c0708
      - 0522c0727
      - 0522c0746
      - 0522c0857
      valid:
      - 0522c0057
      - 0522c0125
      - 0522c0159
      - 0522c0226
      - 0522c0251
      - 0522c0576
      - 0522c0598
      - 0522c0788
      - 0522c0806
    loader:
      data_dir: data
      name: NRRDLoader
      roi_map:
        Brainstem: 1
        Parotid_L: 2
        Parotid_R: 3

2.2.2. Training pipeline design

A sample structure has been written in sample_train.py with a example config configs/training.yaml.

./sample_train.py --config configs/training.yaml

Note that the config/training.yaml assumes a data list file at data_list.yaml.

2.3. Find bounding boxes

You can modify the data loader to specify which structures to enclose, and choose a proper padding to obtain a larger box.

./find_box.py \
    --config configs/nrrd_loader_eyes.yaml \
    --output bbox.json \
    --padding 20

After finished, you will obtain a bbox.json file, which can be used in the 2nd stage segmentation.

3. Testing

3.1. Data loader

Directly use parser to load data

./sample_loader.py --loader-config configs/parsing_loader.yaml

or use NIfTI data loader(but required conversion beforehand).

./sample_loader.py --loader-config configs/nifti_loader.yaml

This program will launch a viewer of the data, you can scroll the mouse wheel to change the slice.

sample loader

3.2. Data generator

Directly use parser to load data

./sample_generator.py \
    --loader-config configs/parsing_loader.yaml \
    --generator-config configs/generator.yaml \
    --output-dir outputs

or use NIfTI data loader(but required conversion beforehand).

./sample_generator.py \
    --loader-config configs/nifti_loader.yaml \
    --generator-config configs/generator.yaml \
    --output-dir outputs

with a example generator config configs/generator.yaml.

BlockSampler   :                  # sample a 3D block from each data(3D image)
  shuffle      : True             # shuffle the data list
  block_shape  : [128, 128, 30]   # the shape of observing area, i.e shape of image
  out_shape    : [96, 96, 20]     # the valid shape of the block, i.e. shape of label
  n_samples    : 32               # number of samples of each data
  ratios       : [0, 1, 2, 2]     # sampling probability of each class, note the lenght should be the same with the number of classes
  n_workers    : 2                # number of multi-thread workers
  verbose      : False            # toggle the debug info of this generator

Augmentor      :                  # do preprocessing and data augmentation
  zoom_range   : [0.8, 1.2]       # zoom the image/label with some scale in range, note that it only apply on x-y plane
  transpose    : True             # transpose the image/label on x-y plane with a fixed probability 0.5
  flip         : True             # flip the image/only on x-y plane with a fixed probability 0.5
  noise        : True             # add a Gaussian noise with mean 0 and sigam 0.05 on the image
  window_width : [90, 110]        # adjust the contrast with the given window width(or a range), default : 100
  window_level : [40, 60]         # adjust the contrast with the given window level(or a range), default : 50
  n_workers    : 2                # number of multi-thread workers
  verbose      : False            # toggle the debug info of this generator

BatchGenerator :                  # pack the processed data into batches
  batch_size   : 32               # batch size
  n_workers    : 2                # number of multi-thread workers
  verbose      : False            # toggle the debug info of this generator

The output files are stored in 3D NIfTI (nii.gz) in the outputs folder. One may view these images by ITK-SNAP.

sample generator

3.3. Check the data generator of bs-ptd-v2

Run the following command to check the data pipeline is fine.

./sample_generator.py \
    --loader-config configs/nrrd_loader.yaml \
    --generator-config configs/bs-ptd-v2-generator.yaml
This nrrd_loader.yaml doesn’t toggle resampling. If you didn’t run conversion beforehand, please use nrrd_loader_resample.yaml instead.

You can find tune the generator config to run on this test, that is modify

BlockGenerator :
  block_shape  : [96, 96, 96]
  stride       : 48
  crop_shape   : [256, 256, -1]
  n_workers    : 4
  ordered      : True
  queue_size   : 4               # unit: whole 3D volume from each scan (1)
  verbose      : False

Augmentor      :
  window_width : 100
  window_level : 50
  n_workers    : 1
  queue_size   : 4               # unit: small blocks extracted from 3D volume (1)
  verbose      : False

BatchGenerator :
  batch_size   : 24              (2)
  n_workers    : 1
  queue_size   : 4               # unit: batch data of size=batch_size (1)
  verbose      : False
1 Find a proper queue_size of max size of the data queue according to your CPU memory.
Note that the default size is 4. And setting it to 0 means unlimited queue_size.
2 Find a proper batch_size according to your GPU memory.
Large batch_size may not help the speed, it also depends on the utilization of GPUs. If you have found that the utilization of GPUs(can be monitored by running nvtop or watch -n1 nvidia-smi) has been full, then increasing the batch size won’t speedup the process.