跳转至

调优存储节点

汇总

  • NVME
echo none > /sys/block/sdX/queue/scheduler
echo 4096 > /sys/block/sdX/queue/read_ahead_kb
echo 1024 > /sys/block/sdX/queue/max_sectors_kb
echo 2 > /sys/block/sdX/queue/nomerges
echo 1 > /sys/block/sdX/queue/rq_affinity
beegfs_ha_beegfs_storage_conf_default_options:                # Default storage configuration options.
  storeAllowFirstRunInit: "false"
  connMaxInternodeNum: 128
  connMgmtdPortTCP: 8008
  connMgmtdPortUDP: 8008
  connStoragePortTCP: 8003
  connStoragePortUDP: 8003
  tuneNumStreamListeners: 2
  tuneNumWorkers: 14
  tuneUseAggressiveStreamPoll: "true"
  tuneFileReadSize: 2048k
  tuneFileWriteSize: 2048k
  connUseRDMA: "true"
  sysTargetOfflineTimeoutSecs: 900    # This is required to avoid the mgmt service prematurely
                                      #   placing targets offline if the preferred meta/storage
                                      #   interface fails (ESOLA-116).

分区建议

下面的示例显示了在 8 个磁盘上创建具有较大 inode 的 XFS 分区的命令(其中数字 8 不包括 RAID-5 或 RAID-6 奇偶校验磁盘的数量)和 128 KB 块大小

128 KB 块大小

mkfs.xfs -d su=128k,sw=8 -l version=2,su=128k -isize=512 /dev/sdX

挂载选项(建议去掉 largio)

 mount -o noatime,nodiratime,logbufs=8,logbsize=256k,largeio,\
           inode64,swalloc,allocsize=131072k \
        /dev/sdX <mountpoint>

使用 XFS 并希望实现最佳的流式写入吞吐量,还需要添加挂载选项allocsize=131072k以降低大文件碎片化的风险

其他

为文件服务器设置适当的 IO 调度程序:

echo deadline > /sys/block/sdX/queue/scheduler

通过增加请求队列的大小为 IO 调度程序提供更多灵活性:

echo 2048 > /sys/block/sdX/queue/nr_requests

为了提高顺序读取的吞吐量,请增加预读数据的最大数量。实际的预读量是自适应的,因此在此处使用较高的值不会损害小型随机访问的性能

echo 4096 > /sys/block/sdX/queue/read_ahead_kb
echo 128 > /sys/block/sdX/queue/max_sectors_kb

参考文档:

虚拟机内存

为了避免在工作负载差异很大的生产环境中写入缓存刷新出现较长的 IO 停顿(延迟),通常需要限制内核脏(写入)缓存大小

echo 5 > /proc/sys/vm/dirty_background_ratio
echo 10 > /proc/sys/vm/dirty_ratio

参考文档:https://blog.csdn.net/weixin_44410537/article/details/98449706

要获得最佳的持续流式传输性能,则可能需要使用不同的设置,这些设置可以很早就启动异步数据写入,并允许将大部分 RAM 用于写入缓存

echo 1 > /proc/sys/vm/dirty_background_ratio
echo 75 > /proc/sys/vm/dirty_ratio

为 inode 缓存分配稍高的优先级有助于避免磁盘搜索 inode 加载:

echo 50 > /proc/sys/vm/vfs_cache_pressure

文件系统数据的缓冲需要频繁分配内存。提高保留内核内存量将使关键情况下的内存分配更快、更可靠。如果内存不足 8GB,请将相应值提高到 64 MB,否则,将其提高到至少 256 MB:

echo 262144 > /proc/sys/vm/min_free_kbytes

建议启用大页:

echo always > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/defrag

汇总

echo 1 > /proc/sys/vm/dirty_background_ratio
echo 75 > /proc/sys/vm/dirty_ratio
echo 50 > /proc/sys/vm/vfs_cache_pressure
echo 262144 > /proc/sys/vm/min_free_kbytes
echo always > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/defrag

系统 BIOS

动态 CPU 时钟频率调整功能用于节省电量,通常默认启用,但对延迟有很大影响。因此,建议关闭动态 CPU 频率调整

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >/dev/null

tuned-adm profile throughput-performance

并发调优

通过设置tuneNumWorkers 的值来控制工作线程的数量。通常,工作线程数量越多,并行性就越高(例如,服务器将并行处理更多客户端请求)。但工作线程数量越多也会导致更多的并发磁盘访问,因此,尤其是在存储服务器上,理想的工作线程数量可能取决于您使用的磁盘数量

sed -i 's/^connMaxInternodeNum.*/connMaxInternodeNum = 800/g' /etc/beegfs/beegfs-meta.conf
sed -i 's/^tuneNumWorkers.*/tuneNumWorkers = 128/g' /etc/beegfs/beegfs-meta.conf

其他

tuneFileReadSize             = 256k
tuneFileWriteSize            = 256k

tuneWorkerBufSize            = 16m

总结

sed -i 's/tuneNumWorkers.*=.*/tuneNumWorkers               = 64/g' /etc/beegfs/beegfs-storage.conf

ethtool -G $nic_card rx 2047 tx 2047 rx-jumbo 8191

echo 5 > /proc/sys/vm/dirty_background_ratio
echo 20 > /proc/sys/vm/dirty_ratio
echo 50 > /proc/sys/vm/vfs_cache_pressure
echo 262144 > /proc/sys/vm/min_free_kbytes
echo 1 > /proc/sys/vm/zone_reclaim_mode

echo always > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/defrag

echo "net.ipv4.tcp_timestamps = 0" >> /etc/sysctl.conf
echo "net.ipv4.tcp_window_scaling = 1" >> /etc/sysctl.conf
echo "net.ipv4.tcp_adv_win_scale=1" >> /etc/sysctl.conf
echo "net.ipv4.tcp_low_latency=1" >> /etc/sysctl.conf
echo "net.ipv4.tcp_sack = 1" >> /etc/sysctl.conf

echo "net.core.wmem_max=16777216" >> /etc/sysctl.conf
echo "net.core.rmem_max=16777216" >> /etc/sysctl.conf
echo "net.core.wmem_default=16777216" >> /etc/sysctl.conf
echo "net.core.rmem_default=16777216" >> /etc/sysctl.conf
echo "net.core.optmem_max=16777216" >> /etc/sysctl.conf
echo "net.core.netdev_max_backlog=27000" >> /etc/sysctl.conf

echo "net.ipv4.tcp_rmem = 212992 87380 16777216" >> /etc/sysctl.conf
echo "net.ipv4.tcp_wmem = 212992 65536 16777216" >> /etc/sysctl.conf

/sbin/sysctl -p /etc/sysctl.conf

tuned-adm profile throughput-performance

devices=(sdb sdc)
for dev in "${devices[@]}"
do
  echo deadline > /sys/block/${dev}/queue/scheduler
  echo 4096 > /sys/block/${dev}/queue/nr_requests
  echo 4096 > /sys/block/${dev}/queue/read_ahead_kb
  echo 256 > /sys/block/${dev}/queue/max_sectors_kb
done

调整参数

sed -i 's/^connMaxInternodeNum.*/connMaxInternodeNum = 800/g' /etc/beegfs/beegfs-storage.conf
sed -i 's/^tuneNumWorkers.*/tuneNumWorkers = 128/g' /etc/beegfs/beegfs-storage.conf
sed -i 's/^tuneFileReadAheadSize.*/tuneFileReadAheadSize = 32m/g' /etc/beegfs/beegfs-storage.conf
sed -i 's/^tuneFileReadAheadTriggerSize.*/tuneFileReadAheadTriggerSize = 2m/g' /etc/beegfs/beegfs-storage.conf
sed -i 's/^tuneFileReadSize.*/tuneFileReadSize = 256k/g' /etc/beegfs/beegfs-storage.conf
sed -i 's/^tuneFileWriteSize.*/tuneFileWriteSize = 256k/g' /etc/beegfs/beegfs-storage.conf
sed -i 's/^tuneWorkerBufSize.*/tuneWorkerBufSize = 16m/g' /etc/beegfs/beegfs-storage.conf