* update to 7.1.2 (point to latest release)

* mount and format drives in cluster init
* add monitor node array
* disks are no longer persistent
* now you can deallocate the entire cluster, and restart (disks, hostname)
* a dedicated raid array for metadata
This commit is contained in:
Michael Requa 2019-01-25 13:11:51 -08:00
Родитель 948fccceee
Коммит 5ccd19c0d4
14 изменённых файлов: 355 добавлений и 57 удалений

Просмотреть файл

@ -1,7 +1,63 @@
# Using CycleCloud for setting up a BeeGFS Cluster on Azure
A CycleCloud Project for starting a BeeGFS cluster in Azure. More info to follow.
A CycleCloud Project for starting a BeeGFS cluster in Azure.
## Performance, Capacity and Cost Planning
The BeeGFS cluster here is a collection of VMs with attached Azure Premium Managed
disks. Each storage node hosts both the BeeGFS-Metadata and -Storage daemons with
one RAID array for each.
HPC workloads can have widely varying I/0 requirements. To understand the considerations
of the configuration options an example is offered here. Configuration of storage node:
* Storage VM
* 2 - Metadata Disks (Raid0)
* 4 - Storage Disks (Raid0)
Azure P30 represents the best performance/cost disk and offers 1TB of storage.
Using the P30 as a primary storage device, then each storage node would have 4TB capacity.
A rule of thumb is to use 1/4 metadata storage to object storage so 2 x P20 metadata
disks resulting in 1TB of metadata storage is a reasonable initial design.
With 4 x P30 the VM has access to *1 GB/s* or *30 kIOPS* whichever is less, based
on the specification of the block devices. A virtual machine also has a throughput
allowance. Consider using Standard_D32s_v3 as a storage VM which offers *768 MB/s* or *51.2 kIOPS*.
* Storage VM Standard_D32s_v3 (*768 MB/s*, *51.2 kIOPS*)
* Metadata: 2 x P20 (total: 1 TB, *300 MB/s*, *4.6 kIOPS* )
* Storage: 4 x P30 (total: 4 TB, *1 GB/s*, *30 kIOPS*)
A cluster can be constructed by 1 or more of these compute nodes where with each compute node added the
net resources available to a pool of clients will grow proportionally.
## Cluster Life-Cycle
It is possible to delete data and disks managed by CycleCloud in the CycleCloud UI.
This can result in data loss. Here are the actions available in the CycleCloud management.
* Create Cluster - creates storage VMs and disks
* Add Node - add additional node will increase size & resources of cluster
* Shutdown/Deallocate Node - will suspend node but preserve disks.
* Start Deallocated Node - restore data and resources of deallocated node.
* Shutdown/Delete Node - delete VM and disks, data on disks will be destroyed.
* Terminate Cluster - delete all VMs and disks, all data destroyed.
It is possible to create a BeeGFS cluster, populate the data then when the workload
is finished, deallocate the VMs so that the cluster can be restarted.
This is helpful in controlling costs, because charges for the VMs will be suspended while
deallocated. Keep in mind that disks will still accrue charges while the VMs are
deallocated.
![CC VM Deallocate](/images/deallocate.png "Preserve data by deallocating VMs")
## Monitoring
This cluster includes a recipe for I/O monitoring in _Grafana_. By adding a _monitor_
node to the cluster a VM will start and host a monitoring UI. This service can be
accessed by _HTTP_ on port 3000 of the monitor host (username: `admin`, default_pw: `admin`).
![BeeGFS Monitoring](/images/grafana.png "Monitor IOPs, Througput, Requests")
# Contributing
@ -15,4 +71,4 @@ provided by the bot. You will only need to do this once across all repos using o
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

Двоичные данные
images/deallocate.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 35 KiB

Двоичные данные
images/grafana.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 147 KiB

Просмотреть файл

@ -1,5 +1,5 @@
[project]
version = 1.1.0
version = 1.2.0
name = beegfs
label = BeeGFS
type = Infrastructure
@ -19,3 +19,5 @@ run_list = recipe[beegfs::mds], recipe[beegfs::oss], recipe[cganglia::client]
[spec beeond]
run_list = recipe[beegfs::beeond], recipe[cganglia::client]
[spec mon]
run_list = recipe[beegfs::mon], recipe[cganglia::client]

Просмотреть файл

@ -1,6 +1,7 @@
default['beegfs']['repo']['version'] = 'beegfs_7_1'
default['beegfs']['repo']['version'] = 'latest-stable'
default['beegfs']['repo']['key_version'] = 'beegfs_7_1'
default['beegfs']['repo']['yum']['baseurl'] = "https://www.beegfs.io/release/#{node['beegfs']['repo']['version']}/dists/rhel#{node['platform_version'].to_i}"
default['beegfs']['repo']['yum']['gpgkey'] = "https://www.beegfs.io/release/#{node['beegfs']['repo']['version']}/gpg/RPM-GPG-KEY-beegfs"
default['beegfs']['repo']['yum']['gpgkey'] = "https://www.beegfs.io/release/#{node['beegfs']['repo']['key_version']}/gpg/RPM-GPG-KEY-beegfs"
default['beegfs']['repo']['apt']['uri'] = "https://www.beegfs.io/release/#{node['beegfs']['repo']['version']}"
default['beegfs']['repo']['apt']['arch'] = 'amd64'

Просмотреть файл

@ -0,0 +1,8 @@
proc.sys.vm.dirty_background_ratio=5
proc.sys.vm.dirty_ratio=20
proc.sys.vm.vfs_cache_pressure=50
proc.sys.vm.min_free_kbytes=262144
proc.sys.vm.zone_reclaim_mode=1
sys.kernel.mm.transparent_hugepage.enabled=always
sys.kernel.mm.transparent_hugepage.defrag=always

Просмотреть файл

@ -2,12 +2,14 @@ include_recipe "::default"
packages = case node['platform_family']
when 'rhel'
%w{beegfs-client beegfs-helperd beegfs-utils gcc gcc-c++}
when 'debian'
%w{beegfs-client beegfs-helperd beegfs-utils gcc cpp}
%w{beegfs-client beegfs-helperd beegfs-utils gcc gcc-c++}.each do |pkg|
package pkg do
not_if "rpm -qa | grep #{pkg}"
end
end
packages.each { |p| package p }
when 'debian'
%w{beegfs-client beegfs-helperd beegfs-utils gcc cpp}.each { |p| package p }
end
# Problem with some images running an outdated kernel version,
# where the kernel headers don't exist in the repos anymore.

Просмотреть файл

@ -19,7 +19,17 @@ end
include_recipe '::_search_manager'
# install the beegfs-client and utils package in each node.
%w(beegfs-utils beegfs-client).each { |p| package p }
packages = case node['platform_family']
when 'rhel'
%w{beegfs-utils beegfs-client}.each do |pkg|
package pkg do
not_if "rpm -qa | grep #{pkg}"
end
end
when 'debian'
%w{beegfs-utils beegfs-client}.each { |p| package p }
end
manager_ipaddress = node['beegfs']['manager_ipaddress']
beegfs_client_conf_file = '/etc/beegfs/beegfs-client.conf'

Просмотреть файл

@ -6,8 +6,17 @@ include_recipe "::default"
node.override["beegfs"]["is_manager"] = true
cluster.store_discoverable()
# install packages
%w{beegfs-mgmtd beegfs-helperd beegfs-admon}.each { |p| package p }
packages = case node['platform_family']
when 'rhel'
%w{beegfs-mgmtd beegfs-helperd beegfs-admon}.each do |pkg|
package pkg do
not_if "rpm -qa | grep #{pkg}"
end
end
when 'debian'
%w{beegfs-mgmtd beegfs-helperd beegfs-admon}.each { |p| package p }
end
mgmtd_directory = "#{node["beegfs"]["root_dir"]}/mgmtd"
directory "#{mgmtd_directory}" do
@ -46,4 +55,4 @@ ruby_block "Update #{beegfs_admon_conf_file}" do
notifies :restart, 'service[beegfs-admon]', :immediately
end
# gotta add a cron for discovering master

Просмотреть файл

@ -3,12 +3,18 @@
include_recipe "::default"
include_recipe "::_tune_beegfs"
%w{beegfs-meta}.each { |p| package p }
packages = case node['platform_family']
when 'rhel'
%w{beegfs-meta}.each do |pkg|
package pkg do
not_if "rpm -qa | grep #{pkg}"
end
end
when 'debian'
%w{beegfs-meta}.each { |p| package p }
end
meta_directory = "#{node["beegfs"]["root_dir"]}/meta"
directory "#{meta_directory}" do
recursive true
end
# manager_ipaddress = ::BeeGFS::Helpers.search_for_manager(node['cyclecloud']['cluster']['id'])
manager_ipaddress = node["beegfs"]["manager_ipaddress"]
@ -17,10 +23,6 @@ chef_state = node['cyclecloud']['chefstate']
beegfs_meta_conf_file = "/etc/beegfs/beegfs-meta.conf"
hostname_line = "sysMgmtdHost = #{manager_ipaddress}"
service "beegfs-meta" do
action [:enable]
end
# Run the reconfig again if the manager host changes:
ruby_block "Update #{beegfs_meta_conf_file}" do
block do
@ -32,6 +34,15 @@ ruby_block "Update #{beegfs_meta_conf_file}" do
file.write_file
end
not_if "grep -q '#{hostname_line}' #{beegfs_meta_conf_file}"
notifies :restart, 'service[beegfs-meta]', :immediately
end
defer_block "Defer starting beegfs until end of the converge" do
directory "#{meta_directory}" do
recursive true
end
service "beegfs-meta" do
action [:enable, :start]
end
end

Просмотреть файл

@ -14,7 +14,11 @@ execute 'install_graf_rpm' do
not_if 'rpm -qa | grep grafana-5.4.2-1.x86_64'
end
%w{ beegfs-mon influxdb grafana}.each { |p| package p }
%w{beegfs-mon influxdb grafana}.each do |pkg|
package pkg do
not_if "rpm -qa | grep #{pkg}"
end
end
# manager_ipaddress = ::BeeGFS::Helpers.search_for_manager(node['cyclecloud']['cluster']['id'])
manager_ipaddress = node["beegfs"]["manager_ipaddress"]

Просмотреть файл

@ -3,8 +3,16 @@
include_recipe "::default"
include_recipe "::_tune_beegfs"
%w{beegfs-storage}.each { |p| package p }
packages = case node['platform_family']
when 'rhel'
%w{beegfs-storage}.each do |pkg|
package pkg do
not_if "rpm -qa | grep #{pkg}"
end
end
when 'debian'
%w{beegfs-storage}.each { |p| package p }
end
storage_directory = "#{node["beegfs"]["root_dir"]}/storage"
directory "#{storage_directory}" do
@ -19,10 +27,7 @@ beegfs_storage_conf_file = "/etc/beegfs//beegfs-storage.conf"
hostname_line = "sysMgmtdHost = #{manager_ipaddress}"
# Run the reconfig again if the manager host changes:
service "beegfs-storage" do
action [:enable]
end
ruby_block "Update #{beegfs_storage_conf_file}" do
block do
file = Chef::Util::FileEdit.new(beegfs_storage_conf_file)
@ -38,8 +43,16 @@ ruby_block "Update #{beegfs_storage_conf_file}" do
file.write_file
end
not_if "grep -q '#{hostname_line}' #{beegfs_storage_conf_file}"
notifies :restart, 'service[beegfs-storage]', :immediately
end
defer_block "Defer starting beegfs until end of the converge" do
directory "#{storage_directory}" do
recursive true
end
service "beegfs-storage" do
action [:enable, :start]
end
end

Просмотреть файл

@ -0,0 +1,116 @@
#!/bin/bash
set -x
setup_storage_disks()
{
#replace with jq
mount=$1
raidDevice=$2
BEEGFS_ROOT=`jetpack config beegfs.root_dir || echo "/data/beegfs"`
STORAGE_LUNS=`jetpack config cyclecloud.mounts.${mount}.luns || echo "no drives to configure for $1"; return 0`
filesystem=`jetpack config cyclecloud.mounts.${mount}.type || echo "ext4"`
VOLUME_TYPE=`jetpack config cyclecloud.mounts.${mount}.raid_level || echo "0"`
FS_OPTS=`jetpack config cyclecloud.mounts.${mount}.fs_options || echo "-i 2048 -I 512 -J size=400 -Odir_index,filetype"`
MOUNT_OPTS=`jetpack config cyclecloud.mounts.${mount}.options || echo "noatime,nodiratime,nobarrier,nofail"`
mountPoint=`jetpack config cyclecloud.mounts.${mount}.mountpoint || echo "$BEEGFS_ROOT/$mount"`
DISABLED=`jetpack config cyclecloud.mounts.${mount}.disabled || "False"`
if ! [[ "$DISABLED" == "True" || "$DISABLED" == "true" ]]
then
echo "mount being configured in chef"
return 0
fi
LUNS=${STORAGE_LUNS#"["}
LUNS=${LUNS%"]"}
LUNS=$(echo $LUNS | tr -d ",")
createdPartitions=""
for lun in $LUNS; do
disk=`readlink -f /dev/disk/azure/scsi1/lun$lun`
fdisk -l $disk || break
fdisk $disk << EOF
n
p
1
t
fd
w
EOF
createdPartitions="$createdPartitions ${disk}1"
if [[ "$mount" == "meta" ]]; then
dev=$(basename $disk)
config_meta_device $dev
fi
done
sleep 10
mkdir -p $mountPoint
# Create RAID-0/RAID-5 volume
if [ -n "$createdPartitions" ]; then
devices=`echo $createdPartitions | wc -w`
mdadm --create /dev/$raidDevice --level $VOLUME_TYPE --raid-devices $devices $createdPartitions
sleep 10
mdadm /dev/$raidDevice
if [ "$filesystem" == "xfs" ]; then
mkfs -t $filesystem /dev/$raidDevice
export xfsuuid="UUID=`blkid |grep dev/$raidDevice |cut -d " " -f 2 |cut -c 7-42`"
#echo "$xfsuuid $mountPoint $filesystem rw,noatime,attr2,inode64,nobarrier,sunit=1024,swidth=4096,nofail 0 2" >> /etc/fstab
echo "$xfsuuid $mountPoint $filesystem rw,noatime,attr2,inode64,nobarrier,sunit=1024,swidth=4096,nofail 0 2" >> /etc/fstab
else
#mkfs.ext4 -i 2048 -I 512 -J size=400 -Odir_index,filetype /dev/$raidDevice
mkfs.ext4 $FS_OPTS /dev/$raidDevice
sleep 5
tune2fs -o user_xattr /dev/$raidDevice
export ext4uuid="UUID=`blkid |grep dev/$raidDevice |cut -d " " -f 2 |cut -c 7-42`"
#echo "$ext4uuid $mountPoint $filesystem noatime,nodiratime,nobarrier,nofail 0 2" >> /etc/fstab
echo "$ext4uuid $mountPoint $filesystem noatime,nodiratime,nobarrier,nofail 0 2" >> /etc/fstab
fi
sleep 10
mount -a
fi
if [[ "$mount" == "meta" ]]; then
config_meta_host
fi
}
config_meta_device()
{
dev=$1
echo deadline > /sys/block/${dev}/queue/scheduler
echo 128 > /sys/block/${dev}/queue/nr_requests
echo 128 > /sys/block/${dev}/queue/read_ahead_kb
echo 256 > /sys/block/${dev}/queue/max_sectors_kb
}
config_meta_host()
{
echo 5 > /proc/sys/vm/dirty_background_ratio
echo 20 > /proc/sys/vm/dirty_ratio
echo 50 > /proc/sys/vm/vfs_cache_pressure
echo 262144 > /proc/sys/vm/min_free_kbytes
echo 1 > /proc/sys/vm/zone_reclaim_mode
echo always > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/defrag
}
set_hostname()
{
HOSTNAME=`jetpack config fqdn || return 1`
sed -i 's|^HOSTNAME.*|HOSTNAME='$HOSTNAME'|g' /etc/sysconfig/network
}
setup_storage_disks "meta" "md10"
setup_storage_disks "storage" "md20"
set_hostname

Просмотреть файл

@ -16,16 +16,21 @@ Category = Filesystems
KeyPairLocation = ~/.ssh/cyclecloud.pem
[[[configuration]]]
[[[cluster-init beegfs:default:1.1.0]]]
beegfs.root_dir = $BeeGFSRoot
cyclecloud.maintenance_converge.enabled = false
cyclecloud.selinux.policy = permissive
[[[cluster-init beegfs:default:1.2.0]]]
[[node manager]]
MachineType = $ManagerVMType
IsReturnProxy = $ReturnProxy
[[[configuration]]]
cyclecloud.maintenance_converge.enabled = true
[[[cluster-init beegfs:manager:1.1.0]]]
[[[cluster-init beegfs:manager:1.2.0]]]
[[[network-interface eth0]]]
AssociatePublicIpAddress = $UsePublicNetwork
@ -41,35 +46,74 @@ Category = Filesystems
# The initial number of cores of this type to start when the cluster starts
InitialCount= $InitialStorageCount
Azure.AllocationMethod = Standalone
[[[volume disk1]]]
Size = $MetadataDiskSize
SSD = true
Mount = meta
[[[volume disk1]]]
Size = $StorageDiskSize
SSD = true
Mount = beegfs
Persistent = true
[[[volume disk2]]]
Size = $MetadataDiskSize
SSD = true
Mount = meta
[[[volume disk2]]]
Size = $StorageDiskSize
SSD = true
Mount = beegfs
Persistent = true
[[[configuration cyclecloud.mounts.meta]]]
fs_type = xfs
raid_level = 0
disabled = true
[[[configuration cyclecloud.mounts.beegfs]]]
mountpoint = $BeeGFSRoot
fs_type = ext4
raid_level = 0
[[[volume disk3]]]
Size = $StorageDiskSize
SSD = true
Mount = storage
[[[cluster-init beegfs:storage:1.1.0]]]
[[[volume disk4]]]
Size = $StorageDiskSize
SSD = true
Mount = storage
[[[volume disk5]]]
Size = $StorageDiskSize
SSD = true
Mount = storage
[[[volume disk6]]]
Size = $StorageDiskSize
SSD = true
Mount = storage
[[[configuration cyclecloud.mounts.storage]]]
fs_type = ext4
raid_level = 0
options = noatime,nodiratime,nobarrier,nofail
fs_options = " -J size=400 -Odir_index,filetype"
disabled = true
[[[cluster-init beegfs:storage:1.2.0]]]
[[nodearray metadata]]
MachineType = $MetadataVMType
# The initial number of cores of this type to start when the cluster starts
InitialCount= $InitialMetadataCount
Azure.AllocationMethod = Standalone
[[[cluster-init beegfs:metadata:1.1.0]]]
[[[volume disk1]]]
Size = $MetadataDiskSize
SSD = true
Mount = meta
[[[volume disk2]]]
Size = $MetadataDiskSize
SSD = true
Mount = meta
[[[configuration cyclecloud.mounts.meta]]]
fs_type = xfs
raid_level = 0
disabled = true
[[[cluster-init beegfs:metadata:1.2.0]]]
[[nodearray client]]
MachineType = $ClientVMType
@ -77,8 +121,17 @@ Category = Filesystems
# The initial number of cores of this type to start when the cluster starts
InitialCount= $InitialClientCount
[[[cluster-init beegfs:client:1.1.0]]]
[[[cluster-init beegfs:client:1.2.0]]]
[[nodearray monitor]]
MachineType = $MonitorVMType
# The initial number of cores of this type to start when the cluster starts
InitialCount= $InitialClientCount
MaxCount = 1
[[[cluster-init beegfs:mon:1.2.0]]]
[parameters About]
Order = 1
@ -115,7 +168,7 @@ Order = 10
Label = Metadata VM
Description = The VMType type for metadata nodes
ParameterType = Cloud.MachineType
DefaultValue = Standard_D2_v3
DefaultValue = Standard_DS3_v2
[[[parameter StorageVMType]]]
Label = Storage VM
@ -129,6 +182,13 @@ Order = 10
ParameterType = Cloud.MachineType
DefaultValue = Standard_D2_v3
[[[parameter MonitorVMType]]]
Label = Monitor VM
Description = The VM type for BeeGFS manager node.
ParameterType = Cloud.MachineType
DefaultValue = Standard_D2_v3
[[parameters Networking]]
Order = 40
@ -157,7 +217,12 @@ Order = 20
Label = OSS Disk Size (GB)
Description = Size (GB) of each 2 premium disk attached to each OSS. Disks are raided in RAID0 configuration.
DefaultValue = 1024
[[[parameter MetadataDiskSize]]]
Label = MDS Disk Size (GB)
Description = Size (GB) of each 2 premium disk attached to each OSS. Disks are raided in RAID0 configuration.
DefaultValue = 1024
[[[parameter BeeGFSRoot ]]]
Label = BeeGFS Dir
Description = The root directory for BeeGFS data on the MDT, MGS and OSS servers
@ -193,13 +258,14 @@ Order = 20
[[parameters Software]]
Description = "Specify the scheduling software, and base OS installed on all nodes, and optionally the cluster-init and chef versions from your Locker."
Order = 10
Hidden = true
[[[parameter ImageName]]]
Label = Base OS
ParameterType = Cloud.Image
Config.OS = linux
DefaultValue = cycle.image.centos7
Config.Filter := Package in {"cycle.image.centos7", "cycle.image.centos6", "cycle.image.ubuntu18"}
Config.Filter := Package in {"cycle.image.centos7"}
[[[parameter MasterClusterInitSpecs]]]
Label = Master Cluster-Init