IO500测试结果

beegfs 进行 IO500 测试

[global]
datadir = /mnt/beegfs/test/3
resultdir = ./results/

#verbosity = 10
timestamp-resultdir = TRUE
#drop-caches = TRUE

# Chose parameters that are very small for all benchmarks

[debug]
#stonewall-time = 1
stonewall-time = 300 # for testing

[ior-easy]
transferSize = 1m
blockSize = 200000m
# 112800m
# 72000m - not enough
# 51200m was not enough for stonewall=300
# 102400m
# 1024000m

[mdtest-easy]
# The API to be used
API = POSIX
# Files per proc
n = 5000000
# 500000 - small for multiple MDS
# 1000000

[ior-hard]
# The API to be used
API = POSIX
# Number of segments  10000000
segmentCount = 2200000

# 400000
# 95000

[mdtest-hard]
# The API to be used
API = POSIX
# Files per proc 1000000
n = 160000


[find]
#external-script = /mnt/beeond/io500-app/bin/pfind
# no need to set stonewall time or result directory  - let it use the defaults
#external-extra-args =  -s \$io500_stonewall_timer -r \$io500_result_dir/pfind_results
# below is used by io500.sh only.  The io500 C app will not use, since we are not using some external script, we want to use default pfind. For io500, you can set the -N and -q values using pfind-queue-length, pfind-steal-next
external-extra-args = -N -q 15000
#external-args =  -s $io500_stonewall_timer -r $io500_result_dir/pfind_results
#nproc = 30
pfind-queue-length = 15000
pfind-steal-next = TRUE
pfind-parallelize-single-dir-access-using-hashing = TRUE

mpiexec 命令表示每个节点执行 3 个进程,注意:多节点运行,该代码需要在所有节点都存在,建议使用共享存储来存放代码和结果

测试命令和输出如下

> mpiexec -n 9 -H 10.244.244.201:3,10.244.244.211:3,10.244.244.212:3 --allow-run-as-root /data/nfs/io500/io500 /data/nfs/io500/config-beegfs.ini
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   devmaster
  Local device: mlx4_0
--------------------------------------------------------------------------
; WARNING Using internal pfind, will ignore any arguments to the external script
WARNING Using internal pfind, will ignore any arguments to the external script
; WARNING Using internal pfind, will ignore any arguments to the external script
WARNING Using internal pfind, will ignore any arguments to the external script
; WARNING Using internal pfind, will ignore any arguments to the external script
WARNING Using internal pfind, will ignore any arguments to the external script
; WARNING Using internal pfind, will ignore any arguments to the external script
WARNING Using internal pfind, will ignore any arguments to the external script
; WARNING Using internal pfind, will ignore any arguments to the external script
WARNING Using internal pfind, will ignore any arguments to the external script
; WARNING Using internal pfind, will ignore any arguments to the external script
WARNING Using internal pfind, will ignore any arguments to the external script
; WARNING Using internal pfind, will ignore any arguments to the external script
WARNING Using internal pfind, will ignore any arguments to the external script
; WARNING Using internal pfind, will ignore any arguments to the external script
WARNING Using internal pfind, will ignore any arguments to the external script
IO500 version io500-sc22_v2 (standard)
WARNING Using internal pfind, will ignore any arguments to the external script
[devmaster:631402] 8 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[devmaster:631402] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[RESULT]       ior-easy-write        0.337239 GiB/s : time 351.744 seconds
[RESULT]    mdtest-easy-write       22.901894 kIOPS : time 303.127 seconds
[      ]            timestamp        0.000000 kIOPS : time 0.000 seconds
[RESULT]       ior-hard-write        0.117426 GiB/s : time 860.469 seconds
ERROR INVALID (src/main.c:434) Runtime of phase (189.600956) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:440) Runtime is smaller than expected minimum runtime
[RESULT]    mdtest-hard-write        7.635674 kIOPS : time 189.601 seconds [INVALID]
[RESULT]                 find      168.742151 kIOPS : time 49.537 seconds
[RESULT]        ior-easy-read        0.318700 GiB/s : time 372.201 seconds
[RESULT]     mdtest-easy-stat      108.329951 kIOPS : time 64.875 seconds
[RESULT]        ior-hard-read        0.088844 GiB/s : time 1137.293 seconds
[RESULT]     mdtest-hard-stat      174.470509 kIOPS : time 9.260 seconds
[RESULT]   mdtest-easy-delete       31.236500 kIOPS : time 232.805 seconds
[RESULT]     mdtest-hard-read        7.079911 kIOPS : time 204.396 seconds
[RESULT]   mdtest-hard-delete        9.296353 kIOPS : time 157.333 seconds
[SCORE ] Bandwidth 0.182990 GiB/s : IOPS 32.168215 kiops : TOTAL 2.426203 [INVALID]

The result files are stored in the directory: ./results//2023.02.16-17.58.59
[1676545471.963235] [devmaster:631410:0]       tag_match.c:62   UCX  WARN  unexpected tag-receive descriptor 0x2646ac0 was not matched

拉高优先级

pgrep beegfs-mgmtd | xargs ionice -c 3 -p
pgrep beegfs-mgmtd |  xargs renice -n -20

pgrep beegfs-meta | xargs ionice -c 3 -p
pgrep beegfs-meta |  xargs renice -n -20

pgrep beegfs-storage | xargs ionice -c 3 -p
pgrep beegfs-storage |  xargs renice -n -20

pgrep beegfs-helperd | xargs ionice -c 3 -p
pgrep beegfs-helperd |  xargs renice -n -20

pgrep io500 | xargs ionice -c 3 -p
pgrep io500 |  xargs renice -n -20

更多参数参考:https://github.com/IO500/submission-data/blob/master/Oracle%20Cloud%20Infrastructure/BeeGFS%20on%20Oracle%20Cloud/HPC-BeeGFS/2020/submission-25/io500.sh

#!/bin/bash
#
# INSTRUCTIONS:
# This script takes its parameters from the same .ini file as io500 binary.

function setup_paths {
  # Set the paths to the binaries and how to launch MPI jobs.
  # If you ran ./utilities/prepare.sh successfully, then binaries are in ./bin/
  io500_ior_cmd=$PWD/bin/ior
  io500_mdtest_cmd=$PWD/bin/mdtest
  io500_mdreal_cmd=$PWD/bin/md-real-io
  io500_mpirun="mpiexec"
  io500_mpiargs="--allow-run-as-root -mca btl self -x UCX_TLS=rc,self,sm -x HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 -x UCX_IB_GID_INDEX=3 -n 2040 -npernode 12 --hostfile /mnt/beeond/hostsfile.cn"
}

function setup_directories {
  local workdir
  local resultdir
  local ts

  # set directories where benchmark files are created and where the results go
  # If you want to set up stripe tuning on your output directories or anything
  # similar, then this is the right place to do it.  This creates the output
  # directories for both the app run and the script run.

  timestamp=$(date +%Y.%m.%d-%H.%M.%S)           # create a uniquifier
  [ $(get_ini_global_param timestamp-datadir True) != "False" ] &&
    ts="$timestamp" || ts="io500"
  # directory where the data will be stored
  workdir=$(get_ini_global_param datadir $PWD/datafiles)/$ts
  io500_workdir=$workdir-scr
  [ $(get_ini_global_param timestamp-resultdir True) != "False" ] &&
    ts="$timestamp" || ts="io500"
  # the directory where the output results will be kept
  resultdir=$(get_ini_global_param resultdir $PWD/results)/$ts
  io500_result_dir=$resultdir-scr

  mkdir -p $workdir-{scr,app} $resultdir-{scr,app}
  mkdir -p $workdir-{scr,app}/{ior_easy,ior_hard,mdt_easy,mdt_hard}

# for ior_easy.
beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=1m --numtargets=4 $PWD/$workdir-scr/ior_easy
beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=1m --numtargets=4 $PWD/$workdir-app/ior_easy

# stripe across all OSTs for ior_hard, 256k chunksize
beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=1m --numtargets=170 $PWD/$workdir-scr/ior_hard
beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=1m --numtargets=170 $PWD/$workdir-app/ior_hard

# for ior_easy.
##beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=1m --numtargets=4 $PWD/$workdir-scr/ior_easy
##beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=1m --numtargets=4 $PWD/$workdir-app/ior_easy

# stripe across all OSTs for ior_hard, 256k chunksize
#beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=256k --numtargets=3 $PWD/$workdir-scr/ior_hard
#beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=256k --numtargets=3 $PWD/$workdir-app/ior_hard

# turn off striping and use small chunks for mdtest
beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=64k --numtargets=1 $PWD/$workdir-scr/mdt_easy
beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=64k --numtargets=1 $PWD/$workdir-scr/mdt_hard

beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=64k --numtargets=1 $PWD/$workdir-app/mdt_easy
beegfs-ctl --mount=/mnt/beeond --setpattern --chunksize=64k --numtargets=1 $PWD/$workdir-app/mdt_hard

}

# you should not edit anything below this line
set -eo pipefail  # better error handling

io500_ini="${1:-""}"
if [[ -z "$io500_ini" ]]; then
  echo "error: ini file must be specified.  usage: $0 <config.ini>"
  exit 1
fi
if [[ ! -s "$io500_ini" ]]; then
  echo "error: ini file '$io500_ini' not found or empty"
  exit 2
fi

function get_ini_section_param() {
  local section="$1"
  local param="$2"
  local inside=false

  while read LINE; do
    LINE=$(sed -e 's/ *#.*//' -e '1s/ *= */=/' <<<$LINE)
    $inside && [[ "$LINE" =~ "[.*]" ]] && inside=false && break
    [[ -n "$section" && "$LINE" =~ "[$section]" ]] && inside=true && continue
    ! $inside && continue
    #echo $LINE | awk -F = "/^$param/ { print \$2 }"
    if [[ $(echo $LINE | grep "^$param *=" ) != "" ]] ; then
      # echo "$section : $param : $inside : $LINE" >> parsed.txt # debugging
      echo $LINE | sed -e "s/[^=]*=[ \t]*\(.*\)/\1/"
      return
    fi
  done < $io500_ini
  echo ""
}

function get_ini_param() {
  local section="$1"
  local param="$2"
  local default="$3"

  # try and get the most-specific param first, then more generic params
  val=$(get_ini_section_param $section $param)
  [ -n "$val" ] || val="$(get_ini_section_param ${section%-*} $param)"
  [ -n "$val" ] || val="$(get_ini_section_param global $param)"

  echo "${val:-$default}" |
    sed -e 's/[Ff][Aa][Ll][Ss][Ee]/False/' -e 's/[Tt][Rr][Uu][Ee]/True/'
}

function get_ini_run_param() {
  local section="$1"
  local default="$2"
  local val

  val=$(get_ini_section_param $section noRun)

  # logic is reversed from "noRun=TRUE" to "run=False"
  [[ $val = [Tt][Rr][Uu][Ee] ]] && echo "False" || echo "$default"
}

function get_ini_global_param() {
  local param="$1"
  local default="$2"
  local val

  val=$(get_ini_section_param global $param |
    sed -e 's/[Ff][Aa][Ll][Ss][Ee]/False/' -e 's/[Tt][Rr][Uu][Ee]/True/')

  echo "${val:-$default}"
}

# does the write phase and enables the subsequent read
io500_run_ior_easy="$(get_ini_run_param ior-easy True)"
# does the creat phase and enables the subsequent stat
io500_run_md_easy="$(get_ini_run_param mdtest-easy True)"
# does the write phase and enables the subsequent read
io500_run_ior_hard="$(get_ini_run_param ior-hard True)"
# does the creat phase and enables the subsequent read
io500_run_md_hard="$(get_ini_run_param mdtest-hard True)"
io500_run_find="$(get_ini_run_param find True)"
io500_run_ior_easy_read="$(get_ini_run_param ior-easy-read True)"
io500_run_md_easy_stat="$(get_ini_run_param mdtest-easy-stat True)"
io500_run_ior_hard_read="$(get_ini_run_param ior-hard-read True)"
io500_run_md_hard_stat="$(get_ini_run_param mdtest-easy-stat True)"
io500_run_md_hard_read="$(get_ini_run_param mdtest-easy-stat True)"
# turn this off if you want to just run find by itself
io500_run_md_easy_delete="$(get_ini_run_param mdtest-easy-delete True)"
# turn this off if you want to just run find by itself
io500_run_md_hard_delete="$(get_ini_run_param mdtest-hard-delete True)"
io500_run_md_hard_delete="$(get_ini_run_param mdtest-hard-delete True)"
io500_run_mdreal="$(get_ini_run_param mdreal False)"
# attempt to clean the cache after every benchmark, useful for validating the performance results and for testing with a local node; it uses the io500_clean_cache_cmd (can be overwritten); make sure the user can write to /proc/sys/vm/drop_caches
io500_clean_cache="$(get_ini_global_param drop-caches False)"
io500_clean_cache_cmd="$(get_ini_global_param drop-caches-cmd)"
io500_cleanup_workdir="$(get_ini_run_param cleanup)"
# Stonewalling timer, set to 300 to be an official run; set to 0, if you never want to abort...
io500_stonewall_timer=$(get_ini_param debug stonewall-time 300)
# Choose regular for an official regular submission or scc for a Student Cluster Competition submission to execute the test cases for 30 seconds instead of 300 seconds
io500_rules="regular"

# to run this benchmark, find and edit each of these functions.  Please also
# also edit 'extra_description' function to help us collect the required data.
function main {
  setup_directories
  setup_paths
  setup_ior_easy # required if you want a complete score
  setup_ior_hard # required if you want a complete score
  setup_mdt_easy # required if you want a complete score
  setup_mdt_hard # required if you want a complete score
  setup_find     # required if you want a complete score
  setup_mdreal   # optional

  run_benchmarks

  if [[ ! -s "system-information.txt" ]]; then
    echo "Warning: please create a system-information.txt description by"
    echo "copying the information from https://vi4io.org/io500-info-creator/"
  else
    cp "system-information.txt" $io500_result_dir
  fi

  create_tarball
}

function setup_ior_easy {
  local params

  io500_ior_easy_size=$(get_ini_param ior-easy blockSize 9920000m | tr -d m)
  val=$(get_ini_param ior-easy API POSIX)
  [ -n "$val" ] && params+=" -a $val"
  val="$(get_ini_param ior-easy transferSize)"
  [ -n "$val" ] && params+=" -t $val"
  val="$(get_ini_param ior-easy hintsFileName)"
  [ -n "$val" ] && params+=" -U $val"
  val="$(get_ini_param ior-easy posix.odirect)"
  [ "$val" = "True" ] && params+=" --posix.odirect"
  val="$(get_ini_param ior-easy verbosity)"
  if [ -n "$val" ]; then
    for i in $(seq $val); do
      params+=" -v"
    done
  fi
  io500_ior_easy_params="$params"
  echo -n ""
}

function setup_mdt_easy {
  io500_mdtest_easy_params="-u -L" # unique dir per thread, files only at leaves

  val=$(get_ini_param mdtest-easy n 1000000)
  [ -n "$val" ] && io500_mdtest_easy_files_per_proc="$val"
  val=$(get_ini_param mdtest-easy API POSIX)
  [ -n "$val" ] && io500_mdtest_easy_params+=" -a $val"
  val=$(get_ini_param mdtest-easy posix.odirect)
  [ "$val" = "True" ] && io500_mdtest_easy_params+=" --posix.odirect"
  echo -n ""
}

function setup_ior_hard {
  local params

  io500_ior_hard_api=$(get_ini_param ior-hard API POSIX)
  io500_ior_hard_writes_per_proc="$(get_ini_param ior-hard segmentCount 10000000)"
  val="$(get_ini_param ior-hard hintsFileName)"
  [ -n "$val" ] && params+=" -U $val"
  val="$(get_ini_param ior-hard posix.odirect)"
  [ "$val" = "True" ] && params+=" --posix.odirect"
  val="$(get_ini_param ior-easy verbosity)"
  if [ -n "$val" ]; then
    for i in $(seq $val); do
      params+=" -v"
    done
  fi
  io500_ior_hard_api_specific_options="$params"
  echo -n ""
}

function setup_mdt_hard {
  val=$(get_ini_param mdtest-hard n 1000000)
  [ -n "$val" ] && io500_mdtest_hard_files_per_proc="$val"
  io500_mdtest_hard_api="$(get_ini_param mdtest-hard API POSIX)"
  io500_mdtest_hard_api_specific_options=""
  echo -n ""
}

function setup_find {
  val="$(get_ini_param find external-script)"
  [ -z "$val" ] && io500_find_mpi="True" && io500_find_cmd="$PWD/bin/pfind" ||
    io500_find_cmd="$val"
  # uses stonewalling, run pfind
  io500_find_cmd_args="$(get_ini_param find external-extra-args)"
  echo -n ""
}

function setup_mdreal {
  echo -n ""
}

function run_benchmarks {
  local app_first=$((RANDOM % 100))
  local app_rc=0

  # run the app and C version in random order to try and avoid bias
  (( app_first >= 50 )) && $io500_mpirun $io500_mpiargs $PWD/io500 $io500_ini --timestamp $timestamp || app_rc=$?

  # Important: source the io500_fixed.sh script.  Do not change it. If you
  # discover a need to change it, please email the mailing list to discuss.
  source build/io500-dev/utilities/io500_fixed.sh 2>&1 |
    tee $io500_result_dir/io-500-summary.$timestamp.txt

  (( $app_first >= 50 )) && return $app_rc

  echo "The io500.sh was run"
  echo
  echo "Running the C version of the benchmark now"
  # run the app and C version in random order to try and avoid bias
  $io500_mpirun $io500_mpiargs $PWD/io500 $io500_ini --timestamp $timestamp
}

create_tarball() {
  local sourcedir=$(dirname $io500_result_dir)
  local fname=$(basename ${io500_result_dir%-scr})
  local tarball=$sourcedir/io500-$HOSTNAME-$fname.tgz

  cp -v $0 $io500_ini $io500_result_dir
  tar czf $tarball -C $sourcedir $fname-{app,scr}
  echo "Created result tarball $tarball"
}

# Information fields; these provide information about your system hardware
# Use https://vi4io.org/io500-info-creator/ to generate information about
# your hardware that you want to include publicly!
function extra_description {
  # UPDATE: Please add your information into "system-information.txt" pasting the output of the info-creator
  # EXAMPLE:
  # io500_info_system_name='xxx'
  # DO NOT ADD IT HERE
  :
}

main

输出结果包含如下

test score units
ior-easy-write GiB/s
mdtest-easy-write kIOPS
ior-hard-write GiB/s
mdtest-hard-write kIOPS
find kIOPS
ior-easy-read GiB/s
mdtest-easy-stat kIOPS
ior-hard-read GiB/s
mdtest-hard-stat kIOPS
mdtest-easy-delete kIOPS
mdtest-hard-read kIOPS
mdtest-hard-delete kIOPS

指定的文件夹中也会生成一个日期标记的测试结果集文件夹

> ls
2023.02.16-17.58.59

该文件夹下有如下内容:

> ls     
config-orig.ini    ior-easy-read.txt   ior-hard-read.txt       mdtest-easy-delete.txt  mdtest-easy-write.txt   mdtest-hard-read.txt   mdtest-hard-write.txt
config.ini         ior-easy-write.csv  ior-hard-write.csv      mdtest-easy-stat.csv    mdtest-hard-delete.csv  mdtest-hard-stat.csv   result.txt
find.csv           ior-easy-write.txt  ior-hard-write.txt      mdtest-easy-stat.txt    mdtest-hard-delete.txt  mdtest-hard-stat.txt   result_summary.txt
ior-easy-read.csv  ior-hard-read.csv   mdtest-easy-delete.csv  mdtest-easy-write.csv   mdtest-hard-read.csv    mdtest-hard-write.csv  timestampfile