雜思集

注意事項：

升級 Cumulative Update ，若是 DAG 架構，須先暫停 DAG
DAG 內的成員必須接連升級, 中間相隔不要超過一天
要用 CMD （run as administrator 權限）執行，不然會有權限異常，或檔案更新失敗等異常現象
建議先做備份( Exchange & DC )，避免升級失敗

流程：

Prepare Active Directory and Domains：

Upgrade Cumulative Update 前製作業：

Windows Updates 更新到最新版本

( 確認目前的 .NET Framework 是否符合需求 )
https://docs.microsoft.com/en-us/exchange/plan-and-deploy/supportability-matrix?view=exchserver-2016

Restart the server
Exchange Server 進入維護模式
關閉防毒軟體服務
關閉備份軟體服務
關閉其他軟體服務 ( MMC、powershell 、監控 )

執行 Cumulative Update ：

Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:Upgrade [/DomainController:<ServerFQDN>] [/EnableErrorReporting]
Restart the server

Cumulative Update 後製作業：

確認 event logs 是否有錯誤或警告
確認服務是否都已啟動
Exchange Server 離開維護模式
檢查 Exchange 是否正常運作

Prepare Active Directory and Domains：

Setup.exe /IAcceptExchangeServerLicenseTerms /PrepareSchema

Setup.exe /IAcceptExchangeServerLicenseTerms /PrepareAD

Setup.exe /IAcceptExchangeServerLicenseTerms /PrepareAllDomains

Upgrade Cumulative Update 前製作業：

Exchange Server 進入維護模式

(請將 ExchangeServer 更改為你的Exchange Server 電腦名稱，

ExchangeServer01 更改為你的Exchange Server 電腦名稱，EXDAG01 更改為 DAG Cluster 名稱 )

## 1. Drain the mail queues on the server --將 Mail Queue 清空
Set-ServerComponentState -Identity ExchangeServer -Component HubTransport -State Draining -Requester Maintenance

## 2. Restart-Service
Restart-Service MSExchangeTransport
Restart-Service MSExchangeFrontEndTransport

## 3. Redirect pending messages to another Mailbox server.
Redirect-Message -Server ExchangeServer.coretronic.com -Target ExchangeServer01.coretronic.com

## 4. 有 DAG 的話，須先從 DAG 停用 Server
Suspend-ClusterNode ExchangeServer

## 5. Move all the active databases off of the server to another DAG member.
Set-MailboxServer ExchangeServer -DatabaseCopyActivationDisabledAndMoveNow $True

## 6. Look at the status of the database auto activation policy. Keep this handy. We will need this when we take the server out of maintenance mode.
Get-MailboxServer ExchangeServer | Select DatabaseCopyAutoActivationPolicy

## 7.Block the server from hosting active database copies
Set-MailboxServer ExchangeServer -DatabaseCopyAutoActivationPolicy Blocked

## 8. 確認目前狀態，是否可以進入 maintenance mode 
Get-MailboxDatabaseCopyStatus -Server ExchangeServer | Where {$_.Status -eq "Mounted"}
Get-Queue

## 9. Place the server in maintenance mode.
Set-ServerComponentState ExchangeServer -Component ServerWideOffline -State Inactive -Requester Maintenance

## 確認目前狀態
Get-ServerComponentState ExchangeServer | Select Component, State

關閉防毒軟體服務
關閉備份軟體服務
關閉其他軟體服務 ( MMC、powershell 、監控 )

執行 Cumulative Update

Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:Upgrade [/DomainController:<ServerFQDN>] [/EnableErrorReporting]
Restart the server

Cumulative Update 後製作業：

確認 event logs 是否有錯誤或警告
確認服務是否都已啟動
Exchange Server 離開維護模式

## 1. Take the server out of maintenance mode
Set-ServerComponentState ExchangeServer -Component ServerWideOffline -State
Active -Requester Maintenance

## 2. Unpause the DAG node
Resume-ClusterNode ExchangeServer

## 3. Set the server to allow database copy activation
Set-MailboxServer ExchangeServer -DatabaseCopyActivationDisabledAndMoveNow $False

## 4. Set the database auto activation policy back to its original setting
Set-MailboxServer ExchangeServer -DatabaseCopyAutoActivationPolicy Unrestricted

## 5. Set the hub transport component back to active to allow it to accept connections.
Set-ServerComponentState ExchangeServer -Component HubTransport -State Active -Requester Maintenance

## 6. MS recommends that the transport services be restarted to help this change get picked up immediately.
Restart-Service MSExchangeTransport
Restart-Service MSExchangeFrontEndTransport

## 7. If the server was a DAG member and you moved all the active copies off of the server, you can easily move them back based on mount preference by running the RedistributeActiveDatabases.ps1 script

cd $exscripts
.\RedistributeActiveDatabases.ps1 -DagName "EXDAG01" -BalanceDbsByActivationPreference -SkipMoveSuppressionChecks -Confirm:$false

檢查 Exchange 是否正常運作

Get-queue

Get-ClusterNode ExchangeServer

Get-ClusterNode 

Test-ServiceHealth ExchangeServer

Get-ExchangeServer | Test-ServiceHealth

Test-MAPIConnectivity -Server ExchangeServer

Get-ExchangeServer | Test-MAPIConnectivity

Get-MailboxDatabaseCopyStatus -Server "ExchangeServer" | Sort Name | Select Name, Status, Contentindexstate

Get-MailboxDatabaseCopyStatus * | Sort Name | Select Name, Status, Contentindexstate

Test-ReplicationHealth -Server ExchangeServer

Get-DatabaseAvailabilityGroup | Select -ExpandProperty:Servers | Test-ReplicationHealth | Sort Name

Get-MailboxServer ExchangeServer | Select Name, DatabaseCopyAutoActivationPolicy

Get-MailboxServer | Select Name, DatabaseCopyAutoActivationPolicy

機器名稱和對應 IP

K8S01 Master 192.168.8.53 Ubuntu 18.04

K8S02 Notes 192.168.8.54 Ubuntu 18.04

K8S03 Notes 192.168.8.55 Ubuntu 18.04

SVAI01 Notes – GPU 192.168.3.48 Ubuntu 18.04

SVAI02 Notes – GPU 192.168.3.49 Ubuntu 18.04

安裝前注意

設置主機名

sudo hostnamectl set-hostname k8s-master

sudo vi /etc/hostname

/etc/hosts 要添加全部 hosts

關閉防火牆

sudo iptables -F

關閉系統 Swap

sudo swapoff -a

修改 /etc/fstab，避免 Swap 自動掛載

sudo sed -e '/swap/ s/^#*/#/' -i /etc/fstab

確認關閉

free -m

在所有節點上將系統軟件包更新到最新版本：

sudo apt-get update

sudo apt-get upgrade

sudo apt-get install linux-image-extra-virtual

sudo reboot

添加用戶以管理Kubernetes集群：

sudo useradd -s /bin/bash -m kube

sudo passwd kube pw:kube

sudo usermod -aG sudo kube

echo "kube ALL=(ALL) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/kube

安裝Docker Engine

先確認系統上已卸載任何舊版本的Docker引擎：

sudo apt-get remove docker docker-engine docker.i

安裝相關套件

sudo apt-get install apt-transport-https ca-certificates curl software-properties-common

安裝Docker

sudo apt install docker.io

sudo systemctl enable docker

Install Docker -CE

安裝GPG證書

https_proxy=192.168.1.88:3128 wget https://download.docker.com/linux/ubuntu/gpg -O docker.key

sudo apt-key add docker.key

寫入軟件源信息

add source

Create a new file for the Docker repository at /etc/apt/sources.list.d/docker.list

寫入軟件源信息

sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

安裝 Docker-CE

sudo apt-get install docker-ce

測試 hello-world ，就出現錯誤

sudo docker run hello-world

先建下面目錄

sudo mkdir /etc/systemd/system/docker.service.d

再新增一個 http-proxy.conf 檔案

sudo vi /etc/systemd/system/docker.service.d/http-proxy.conf

內容如下：

[Service]

Environment="HTTP_PROXY=http://192.168.2.91:80/"

Environment="HTTPS_PROXY=http://192.168.2.91:80/"

sudo systemctl daemon-reload

sudo systemctl show --property Environment docker

sudo systemctl restart docker

再跑一次 sudo docker run hello-world 還是錯誤，但是錯誤碼不同。要用 docker login

去 Docker 註冊一個帳號，跑一次 Docker login

sudo docker run hello-world -- 再跑一次，終於成攻了

Kuberntes 安裝

添加憑證和 repository

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add

sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

安裝K8S 相關套件

sudo apt install kubeadm kubectl kubelet

初始化 Maste

sudo kubeadm init --kubernetes-version=v1.13.4 --pod-network-cidr=10.244.0.0/16 service-cidr=10.96.0.0/12

沒有關閉 SWAP 會出現下面錯誤

執行畫面

neo@u1810:~$ sudo kubeadm init --kubernetes-version=v1.13.4 --pod-network-cidr=10.244.0.0/16 service-cidr=10.96.0.0/12

[init] Using Kubernetes version: v1.13.4

[preflight] Running pre-flight checks

[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06

[preflight] Pulling images required for setting up a Kubernetes cluster

[preflight] This might take a minute or two, depending on the speed of your internet connection

[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'

[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"

[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"

[kubelet-start] Activating the kubelet service

[certs] Using certificateDir folder "/etc/kubernetes/pki"

[certs] Generating "ca" certificate and key

[certs] Generating "apiserver-kubelet-client" certificate and key

[certs] Generating "apiserver" certificate and key

[certs] apiserver serving cert is signed for DNS names [u1810 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.8.53]

[certs] Generating "etcd/ca" certificate and key

[certs] Generating "etcd/server" certificate and key

[certs] etcd/server serving cert is signed for DNS names [u1810 localhost] and IPs [192.168.8.53 127.0.0.1 ::1]

[certs] Generating "etcd/peer" certificate and key

[certs] etcd/peer serving cert is signed for DNS names [u1810 localhost] and IPs [192.168.8.53 127.0.0.1 ::1]

[certs] Generating "etcd/healthcheck-client" certificate and key

[certs] Generating "apiserver-etcd-client" certificate and key

[certs] Generating "front-proxy-ca" certificate and key

[certs] Generating "front-proxy-client" certificate and key

[certs] Generating "sa" key and public key

[kubeconfig] Using kubeconfig folder "/etc/kubernetes"

[kubeconfig] Writing "admin.conf" kubeconfig file

[kubeconfig] Writing "kubelet.conf" kubeconfig file

[kubeconfig] Writing "controller-manager.conf" kubeconfig file

[kubeconfig] Writing "scheduler.conf" kubeconfig file

[control-plane] Using manifest folder "/etc/kubernetes/manifests"

[control-plane] Creating static Pod manifest for "kube-apiserver"

[control-plane] Creating static Pod manifest for "kube-controller-manager"

[control-plane] Creating static Pod manifest for "kube-scheduler"

[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s

[apiclient] All control plane components are healthy after 31.014621 seconds

[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster

[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "u1810" as an annotation

[mark-control-plane] Marking the node u1810 as control-plane by adding the label "node-role.kubernetes.io/master=''"

[mark-control-plane] Marking the node u1810 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

[bootstrap-token] Using token: rnrbe5.tq9bglome3cmceci

[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles

[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials

[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token

[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster

[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace

[addons] Applied essential addon: CoreDNS

[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.

Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node

as root: 下面這串是要添加 Node 所需的指令和 taken

kubeadm join 192.168.8.53:6443 --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f

創建用戶配置文件

mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

測試

kubectl get componentstatus

kubectl get nodes

安裝網路 -- flannel

sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

確認 Master 是否正常

kubectl get nodes

kubectl get pods --all-namespaces

添加 Node 到 K8S Cluster

使用之前 kubeadm init 最後產生的資訊，在要增加新的 Node 上執行

sudo kubeadm join 192.168.8.53:6443 --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f

#token 24小時會過期，若後續還須添加新的node，需產新的 token

kubeadm token create

neo@k8s02:~$ sudo kubeadm join 192.168.8.53:6443 --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f

[preflight] Running pre-flight checks

[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06

[discovery] Trying to connect to API Server "192.168.8.53:6443"

[discovery] Created cluster-info discovery client, requesting info from "https://192.168.8.53:6443"

[discovery] Requesting info from "https://192.168.8.53:6443" again to validate TLS against the pinned public key

[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.8.53:6443"

[discovery] Successfully established connection with API Server "192.168.8.53:6443"

[join] Reading configuration from the cluster...

[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace

[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"

[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"

[kubelet-start] Activating the kubelet service

[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...

[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s02" as an annotation

This node has joined the cluster:

* Certificate signing request was sent to apiserver and a response was received.

* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

在 Master 執行 kubectl get nodes ，可以確認多出 Node

檢查 cluster 是否健康

kubectl get cs

kubectl cluster-info

kubectl version -- short=true

K8S 安裝 NVIDIA Device Plugin

參考官網步驟 https://github.com/NVIDIA/k8s-device-plugin

Prerequisites

The list of prerequisites for running the NVIDIA device plugin is described below:

NVIDIA drivers ~= 361.93
nvidia-docker version > 2.0 (see how to install and it's prerequisites)
docker configured with nvidia as the default runtime.
Kubernetes version = 1.11

#查看目前 NVIDIA 硬體

lspci | grep NVIDIA

#使用官方的NVIDIA驅動程式進行手動安裝

https://www.nvidia.com/Download/

這次使用的顯卡是 2080 Ti

sudo chmod +x NVIDIA-Linux-x86_64-418.43.run

sudo ./NVIDIA-Linux-x86_64-418.43.run -no-x-check -no-nouveau-check -no-opengl-files

#掛載Nvidia驅動

modprobe nvidia

#查看顯示卡資訊

nvidia-smi

#在有 GPU的 Node Enable the nvidia runtime as your default runtime on your node

#修改 /etc/docker/daemon.json 如下：

{

"default-runtime": "nvidia",

"runtimes": {

"nvidia": {

"path": "/usr/bin/nvidia-container-runtime",

"runtimeArgs": []

}

#重啟 docker

sudo systemctl daemon-reload && sudo systemctl restart docker

#Enabling GPU Support in Kubernetes

#在 Master 執行，Enable GPU support

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

#重啟 kubelet

sudo systemctl daemon-reload && sudo systemctl restart kubelet

#確認 GPU Node 是否有 GPU 資源可以分配

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

雜思集

2021年4月12日星期一

Ubuntu 在busybox裡面執行 fsck

2021年3月8日星期一

Exchange 2016 Upgrade Cumulative Update

2019年3月18日星期一

Kubernetes 1.13.4 安裝測試 -- NVIDIA Device Plugin

參考官網步驟 https://github.com/NVIDIA/k8s-device-plugin

Prerequisites

#Enabling GPU Support in Kubernetes

2021年4月12日 星期一

Ubuntu 在busybox裡面執行 fsck

2021年3月8日 星期一

Exchange 2016 Upgrade Cumulative Update

2019年3月18日 星期一

Kubernetes 1.13.4 安裝測試 -- NVIDIA Device Plugin

參考官網步驟 https://github.com/NVIDIA/k8s-device-plugin

Prerequisites

#Enabling GPU Support in Kubernetes

2021年4月12日星期一

2021年3月8日星期一

2019年3月18日星期一