2021年4月12日 星期一

Ubuntu 在busybox裡面執行 fsck


Ubuntu 當機,強制重開機後

啟動過程 跑到了 busybox initramfs 界面

fsck 檢查完成後, reboot server,就正常了。

2021年3月8日 星期一

Exchange 2016 Upgrade Cumulative Update


  1. 升級 Cumulative Update ,若是 DAG 架構,須先暫停 DAG
  2. DAG 內的 成員必須接連升級, 中間相隔不要超過一天
  3. 要用 CMDrun as administrator 權限) 執行,不然會有權限異常,或檔案更新失敗等異常現象
  4. 建議先做備份( Exchange & DC ),避免升級失敗

Upgrade Cumulative Update 前製作業:

執行 Cumulative Update
  • Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:Upgrade [/DomainController:<ServerFQDN>] [/EnableErrorReporting]
  • Restart the server

Cumulative Update 後製作業:
  • 確認 event logs 是否有 錯誤 或 警告
  • 確認 服務 是否都已啟動
  • Exchange Server 離開 維護模式
  • 檢查 Exchange 是否正常運作

Exchange Server 進入 維護模式
(請將 ExchangeServer 更改為 你的Exchange Server 電腦名稱,
ExchangeServer01 更改為 你的Exchange Server 電腦名稱,EXDAG01 更改為 DAG Cluster 名稱 )

## 1. Drain the mail queues on the server --將 Mail Queue 清空
Set-ServerComponentState -Identity ExchangeServer -Component HubTransport -State Draining -Requester Maintenance

## 2. Restart-Service
Restart-Service MSExchangeTransport
Restart-Service MSExchangeFrontEndTransport

## 3. Redirect pending messages to another Mailbox server.
Redirect-Message -Server ExchangeServer.coretronic.com -Target ExchangeServer01.coretronic.com

## 4. 有 DAG 的話,須先從 DAG 停用 Server
Suspend-ClusterNode ExchangeServer

## 5. Move all the active databases off of the server to another DAG member.
Set-MailboxServer ExchangeServer -DatabaseCopyActivationDisabledAndMoveNow $True

## 6. Look at the status of the database auto activation policy. Keep this handy. We will need this when we take the server out of maintenance mode.
Get-MailboxServer ExchangeServer | Select DatabaseCopyAutoActivationPolicy

## 7.Block the server from hosting active database copies
Set-MailboxServer ExchangeServer -DatabaseCopyAutoActivationPolicy Blocked

## 8. 確認目前狀態,是否可以進入 maintenance mode
Get-MailboxDatabaseCopyStatus -Server ExchangeServer | Where {$_.Status -eq "Mounted"}

## 9. Place the server in maintenance mode.
Set-ServerComponentState ExchangeServer -Component ServerWideOffline -State Inactive -Requester Maintenance

## 確認目前狀態
Get-ServerComponentState ExchangeServer | Select Component, State

  • 關閉防毒軟體 服務
  • 關閉備份軟體 服務
  • 關閉其他軟體 服務 ( MMC、powershell 、監控 )

執行 Cumulative Update
  • Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:Upgrade [/DomainController:<ServerFQDN>] [/EnableErrorReporting]
  • Restart the server

Cumulative Update 後製作業:
  • 確認 event logs 是否有 錯誤 或 警告
  • 確認 服務 是否都已啟動
  • Exchange Server 離開 維護模式
## 1. Take the server out of maintenance mode
Set-ServerComponentState ExchangeServer -Component ServerWideOffline -State
Active -Requester Maintenance

## 2. Unpause the DAG node
Resume-ClusterNode ExchangeServer

## 3. Set the server to allow database copy activation
Set-MailboxServer ExchangeServer -DatabaseCopyActivationDisabledAndMoveNow $False

## 4. Set the database auto activation policy back to its original setting
Set-MailboxServer ExchangeServer -DatabaseCopyAutoActivationPolicy Unrestricted

## 5. Set the hub transport component back to active to allow it to accept connections.
Set-ServerComponentState ExchangeServer -Component HubTransport -State Active -Requester Maintenance

## 6. MS recommends that the transport services be restarted to help this change get picked up immediately.
Restart-Service MSExchangeTransport
Restart-Service MSExchangeFrontEndTransport

## 7. If the server was a DAG member and you moved all the active copies off of the server, you can easily move them back based on mount preference by running the RedistributeActiveDatabases.ps1 script

cd $exscripts
.\RedistributeActiveDatabases.ps1 -DagName "EXDAG01" -BalanceDbsByActivationPreference -SkipMoveSuppressionChecks -Confirm:$false

  • 檢查 Exchange 是否正常運作


Get-ClusterNode ExchangeServer


Test-ServiceHealth ExchangeServer

Get-ExchangeServer | Test-ServiceHealth

Test-MAPIConnectivity -Server ExchangeServer

Get-ExchangeServer | Test-MAPIConnectivity

Get-MailboxDatabaseCopyStatus -Server "ExchangeServer" | Sort Name | Select Name, Status, Contentindexstate

Get-MailboxDatabaseCopyStatus * | Sort Name | Select Name, Status, Contentindexstate

Test-ReplicationHealth -Server ExchangeServer

Get-DatabaseAvailabilityGroup | Select -ExpandProperty:Servers | Test-ReplicationHealth | Sort Name

Get-MailboxServer ExchangeServer | Select Name, DatabaseCopyAutoActivationPolicy

Get-MailboxServer | Select Name, DatabaseCopyAutoActivationPolicy

2019年3月18日 星期一

Kubernetes 1.13.4 安裝測試 -- NVIDIA Device Plugin

機器 名稱 和 對應 IP

K8S01   Master   Ubuntu 18.04

K8S02  Notes   Ubuntu 18.04

K8S03  Notes    Ubuntu 18.04

SVAI01 Notes – GPU Ubuntu 18.04

SVAI02 Notes – GPU  Ubuntu 18.04


  • 設置主機名

sudo hostnamectl set-hostname k8s-master

sudo vi /etc/hostname

  • /etc/hosts  要添加全部 hosts

  • 關閉防火牆

sudo iptables -F

  • 關閉系統 Swap

sudo swapoff -a

修改 /etc/fstab,避免 Swap 自動掛載

sudo sed -e '/swap/ s/^#*/#/' -i /etc/fstab


free -m

  • 在所有節點上將系統軟件包更新到最新版本:

sudo apt-get update

sudo apt-get upgrade

sudo apt-get install linux-image-extra-virtual

sudo reboot

  • 添加用戶以管理Kubernetes集群:

sudo useradd -s /bin/bash -m kube

sudo passwd kube   pw:kube

sudo usermod -aG sudo kube

echo "kube ALL=(ALL) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/kube

安裝Docker Engine


sudo apt-get remove docker docker-engine docker.i


sudo apt-get install apt-transport-https ca-certificates curl software-properties-common


sudo apt install docker.io

sudo systemctl enable docker

Install Docker -CE


https_proxy= wget https://download.docker.com/linux/ubuntu/gpg -O docker.key

sudo apt-key add docker.key


add source

Create a new file for the Docker repository at /etc/apt/sources.list.d/docker.list


sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

安裝 Docker-CE

sudo apt-get install docker-ce

測試 hello-world ,就出現錯誤

sudo docker run hello-world

先建 下面目錄

sudo mkdir /etc/systemd/system/docker.service.d

再新增一個 http-proxy.conf 檔案

sudo vi /etc/systemd/system/docker.service.d/http-proxy.conf





sudo systemctl daemon-reload

sudo systemctl show --property Environment docker

sudo systemctl restart docker

再跑一次 sudo docker run hello-world 還是錯誤,但是錯誤碼不同。要用 docker login

去 Docker 註冊一個帳號,跑一次 Docker login

sudo docker run hello-world  -- 再跑一次,終於成攻了

Kuberntes 安裝

添加 憑證 和 repository

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add

sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

安裝K8S 相關套件

sudo apt install kubeadm kubectl kubelet

初始化 Maste

sudo kubeadm init --kubernetes-version=v1.13.4 --pod-network-cidr= service-cidr=

沒有關閉 SWAP 會出現下面錯誤


neo@u1810:~$ sudo kubeadm init --kubernetes-version=v1.13.4 --pod-network-cidr= service-cidr=

[init] Using Kubernetes version: v1.13.4

[preflight] Running pre-flight checks

        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06

[preflight] Pulling images required for setting up a Kubernetes cluster

[preflight] This might take a minute or two, depending on the speed of your internet connection

[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'

[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"

[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"

[kubelet-start] Activating the kubelet service

[certs] Using certificateDir folder "/etc/kubernetes/pki"

[certs] Generating "ca" certificate and key

[certs] Generating "apiserver-kubelet-client" certificate and key

[certs] Generating "apiserver" certificate and key

[certs] apiserver serving cert is signed for DNS names [u1810 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs []

[certs] Generating "etcd/ca" certificate and key

[certs] Generating "etcd/server" certificate and key

[certs] etcd/server serving cert is signed for DNS names [u1810 localhost] and IPs [ ::1]

[certs] Generating "etcd/peer" certificate and key

[certs] etcd/peer serving cert is signed for DNS names [u1810 localhost] and IPs [ ::1]

[certs] Generating "etcd/healthcheck-client" certificate and key

[certs] Generating "apiserver-etcd-client" certificate and key

[certs] Generating "front-proxy-ca" certificate and key

[certs] Generating "front-proxy-client" certificate and key

[certs] Generating "sa" key and public key

[kubeconfig] Using kubeconfig folder "/etc/kubernetes"

[kubeconfig] Writing "admin.conf" kubeconfig file

[kubeconfig] Writing "kubelet.conf" kubeconfig file

[kubeconfig] Writing "controller-manager.conf" kubeconfig file

[kubeconfig] Writing "scheduler.conf" kubeconfig file

[control-plane] Using manifest folder "/etc/kubernetes/manifests"

[control-plane] Creating static Pod manifest for "kube-apiserver"

[control-plane] Creating static Pod manifest for "kube-controller-manager"

[control-plane] Creating static Pod manifest for "kube-scheduler"

[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s

[apiclient] All control plane components are healthy after 31.014621 seconds

[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster

[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "u1810" as an annotation

[mark-control-plane] Marking the node u1810 as control-plane by adding the label "node-role.kubernetes.io/master=''"

[mark-control-plane] Marking the node u1810 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

[bootstrap-token] Using token: rnrbe5.tq9bglome3cmceci

[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles

[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials

[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token

[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster

[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace

[addons] Applied essential addon: CoreDNS

[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube

  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.

Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:


You can now join any number of machines by running the following on each node

as root:  下面這串是 要添加 Node 所需的指令和 taken

kubeadm join --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f


mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config


kubectl get componentstatus

kubectl get nodes

安裝網路 -- flannel

sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

確認 Master 是否正常

kubectl get nodes

kubectl get pods --all-namespaces

  • 添加 Node 到 K8S Cluster

使用之前 kubeadm init 最後產生的資訊,在要增加新的 Node 上執行

sudo kubeadm join --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f

#token 24小時會過期,若後續還須添加新的node,需產新的 token

kubeadm token create

neo@k8s02:~$ sudo kubeadm join --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f

[preflight] Running pre-flight checks

        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06

[discovery] Trying to connect to API Server ""

[discovery] Created cluster-info discovery client, requesting info from ""

[discovery] Requesting info from "" again to validate TLS against the pinned public key

[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server ""

[discovery] Successfully established connection with API Server ""

[join] Reading configuration from the cluster...

[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace

[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"

[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"

[kubelet-start] Activating the kubelet service

[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...

[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s02" as an annotation

This node has joined the cluster:

* Certificate signing request was sent to apiserver and a response was received.

* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

在 Master 執行 kubectl get nodes ,可以確認 多出 Node

檢查 cluster 是否健康

kubectl get cs

kubectl cluster-info

kubectl version -- short=true

K8S 安裝 NVIDIA Device Plugin 

參考官網步驟 https://github.com/NVIDIA/k8s-device-plugin

  • Prerequisites

The list of prerequisites for running the NVIDIA device plugin is described below:

#查看目前 NVIDIA 硬體

lspci | grep NVIDIA



這次使用的顯卡是 2080 Ti

sudo chmod +x NVIDIA-Linux-x86_64-418.43.run

sudo ./NVIDIA-Linux-x86_64-418.43.run -no-x-check -no-nouveau-check -no-opengl-files


modprobe nvidia



#在有 GPU的 Node Enable the nvidia runtime as your default runtime on your node

#修改 /etc/docker/daemon.json 如下:


    "default-runtime": "nvidia",

    "runtimes": {

        "nvidia": {

            "path": "/usr/bin/nvidia-container-runtime",

            "runtimeArgs": []




#重啟 docker

sudo systemctl daemon-reload && sudo systemctl restart docker

#Enabling GPU Support in Kubernetes

#在 Master 執行,Enable GPU support

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

#重啟 kubelet

sudo systemctl daemon-reload && sudo systemctl restart kubelet

#確認 GPU Node 是否有 GPU 資源可以分配

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"