2021年4月12日 星期一

Ubuntu 在busybox裡面執行 fsck

 

Ubuntu 當機,強制重開機後

啟動過程 跑到了 busybox initramfs 界面



fsck 檢查完成後, reboot server,就正常了。

2021年3月8日 星期一

Exchange 2016 Upgrade Cumulative Update

 注意事項:

  1. 升級 Cumulative Update ,若是 DAG 架構,須先暫停 DAG
  2. DAG 內的 成員必須接連升級, 中間相隔不要超過一天
  3. 要用 CMDrun as administrator 權限) 執行,不然會有權限異常,或檔案更新失敗等異常現象
  4. 建議先做備份( Exchange & DC ),避免升級失敗


流程:
Prepare Active Directory and Domains:

Upgrade Cumulative Update 前製作業:

執行 Cumulative Update
  • Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:Upgrade [/DomainController:<ServerFQDN>] [/EnableErrorReporting]
  • Restart the server

Cumulative Update 後製作業:
  • 確認 event logs 是否有 錯誤 或 警告
  • 確認 服務 是否都已啟動
  • Exchange Server 離開 維護模式
  • 檢查 Exchange 是否正常運作




Prepare Active Directory and Domains:

Setup.exe /IAcceptExchangeServerLicenseTerms /PrepareSchema
Setup.exe /IAcceptExchangeServerLicenseTerms /PrepareAD
Setup.exe /IAcceptExchangeServerLicenseTerms /PrepareAllDomains



Upgrade Cumulative Update 前製作業:
Exchange Server 進入 維護模式
(請將 ExchangeServer 更改為 你的Exchange Server 電腦名稱,
ExchangeServer01 更改為 你的Exchange Server 電腦名稱,EXDAG01 更改為 DAG Cluster 名稱 )

## 1. Drain the mail queues on the server --將 Mail Queue 清空
Set-ServerComponentState -Identity ExchangeServer -Component HubTransport -State Draining -Requester Maintenance

## 2. Restart-Service
Restart-Service MSExchangeTransport
Restart-Service MSExchangeFrontEndTransport

## 3. Redirect pending messages to another Mailbox server.
Redirect-Message -Server ExchangeServer.coretronic.com -Target ExchangeServer01.coretronic.com

## 4. 有 DAG 的話,須先從 DAG 停用 Server
Suspend-ClusterNode ExchangeServer

## 5. Move all the active databases off of the server to another DAG member.
Set-MailboxServer ExchangeServer -DatabaseCopyActivationDisabledAndMoveNow $True

## 6. Look at the status of the database auto activation policy. Keep this handy. We will need this when we take the server out of maintenance mode.
Get-MailboxServer ExchangeServer | Select DatabaseCopyAutoActivationPolicy

## 7.Block the server from hosting active database copies
Set-MailboxServer ExchangeServer -DatabaseCopyAutoActivationPolicy Blocked

## 8. 確認目前狀態,是否可以進入 maintenance mode
Get-MailboxDatabaseCopyStatus -Server ExchangeServer | Where {$_.Status -eq "Mounted"}
Get-Queue

## 9. Place the server in maintenance mode.
Set-ServerComponentState ExchangeServer -Component ServerWideOffline -State Inactive -Requester Maintenance

## 確認目前狀態
Get-ServerComponentState ExchangeServer | Select Component, State



  • 關閉防毒軟體 服務
  • 關閉備份軟體 服務
  • 關閉其他軟體 服務 ( MMC、powershell 、監控 )



執行 Cumulative Update
  • Setup.exe /IAcceptExchangeServerLicenseTerms /Mode:Upgrade [/DomainController:<ServerFQDN>] [/EnableErrorReporting]
  • Restart the server




Cumulative Update 後製作業:
  • 確認 event logs 是否有 錯誤 或 警告
  • 確認 服務 是否都已啟動
  • Exchange Server 離開 維護模式
## 1. Take the server out of maintenance mode
Set-ServerComponentState ExchangeServer -Component ServerWideOffline -State
Active -Requester Maintenance

## 2. Unpause the DAG node
Resume-ClusterNode ExchangeServer

## 3. Set the server to allow database copy activation
Set-MailboxServer ExchangeServer -DatabaseCopyActivationDisabledAndMoveNow $False

## 4. Set the database auto activation policy back to its original setting
Set-MailboxServer ExchangeServer -DatabaseCopyAutoActivationPolicy Unrestricted

## 5. Set the hub transport component back to active to allow it to accept connections.
Set-ServerComponentState ExchangeServer -Component HubTransport -State Active -Requester Maintenance

## 6. MS recommends that the transport services be restarted to help this change get picked up immediately.
Restart-Service MSExchangeTransport
Restart-Service MSExchangeFrontEndTransport

## 7. If the server was a DAG member and you moved all the active copies off of the server, you can easily move them back based on mount preference by running the RedistributeActiveDatabases.ps1 script

cd $exscripts
.\RedistributeActiveDatabases.ps1 -DagName "EXDAG01" -BalanceDbsByActivationPreference -SkipMoveSuppressionChecks -Confirm:$false



  • 檢查 Exchange 是否正常運作

Get-queue


Get-ClusterNode ExchangeServer

Get-ClusterNode

Test-ServiceHealth ExchangeServer

Get-ExchangeServer | Test-ServiceHealth

Test-MAPIConnectivity -Server ExchangeServer

Get-ExchangeServer | Test-MAPIConnectivity

Get-MailboxDatabaseCopyStatus -Server "ExchangeServer" | Sort Name | Select Name, Status, Contentindexstate

Get-MailboxDatabaseCopyStatus * | Sort Name | Select Name, Status, Contentindexstate

Test-ReplicationHealth -Server ExchangeServer

Get-DatabaseAvailabilityGroup | Select -ExpandProperty:Servers | Test-ReplicationHealth | Sort Name

Get-MailboxServer ExchangeServer | Select Name, DatabaseCopyAutoActivationPolicy

Get-MailboxServer | Select Name, DatabaseCopyAutoActivationPolicy






2019年3月18日 星期一

Kubernetes 1.13.4 安裝測試 -- NVIDIA Device Plugin


機器 名稱 和 對應 IP

K8S01   Master   192.168.8.53   Ubuntu 18.04

K8S02  Notes     192.168.8.54   Ubuntu 18.04

K8S03  Notes    192.168.8.55    Ubuntu 18.04

SVAI01 Notes – GPU 192.168.3.48 Ubuntu 18.04

SVAI02 Notes – GPU  192.168.3.49  Ubuntu 18.04


安裝前注意

  • 設置主機名

sudo hostnamectl set-hostname k8s-master

sudo vi /etc/hostname

  • /etc/hosts  要添加全部 hosts

  • 關閉防火牆

sudo iptables -F

  • 關閉系統 Swap

sudo swapoff -a

修改 /etc/fstab,避免 Swap 自動掛載

sudo sed -e '/swap/ s/^#*/#/' -i /etc/fstab

確認關閉

free -m

  • 在所有節點上將系統軟件包更新到最新版本:

sudo apt-get update

sudo apt-get upgrade

sudo apt-get install linux-image-extra-virtual

sudo reboot

  • 添加用戶以管理Kubernetes集群:

sudo useradd -s /bin/bash -m kube

sudo passwd kube   pw:kube

sudo usermod -aG sudo kube

echo "kube ALL=(ALL) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/kube


安裝Docker Engine

先確認系統上已卸載任何舊版本的Docker引擎:

sudo apt-get remove docker docker-engine docker.i

安裝相關套件

sudo apt-get install apt-transport-https ca-certificates curl software-properties-common

安裝Docker

sudo apt install docker.io

sudo systemctl enable docker

Install Docker -CE

安裝GPG證書

https_proxy=192.168.1.88:3128 wget https://download.docker.com/linux/ubuntu/gpg -O docker.key

sudo apt-key add docker.key

寫入軟件源信息

add source

Create a new file for the Docker repository at /etc/apt/sources.list.d/docker.list

寫入軟件源信息

sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

安裝 Docker-CE

sudo apt-get install docker-ce


測試 hello-world ,就出現錯誤

sudo docker run hello-world

先建 下面目錄

sudo mkdir /etc/systemd/system/docker.service.d

再新增一個 http-proxy.conf 檔案

sudo vi /etc/systemd/system/docker.service.d/http-proxy.conf

內容如下:

[Service]

Environment="HTTP_PROXY=http://192.168.2.91:80/"

Environment="HTTPS_PROXY=http://192.168.2.91:80/"


sudo systemctl daemon-reload

sudo systemctl show --property Environment docker

sudo systemctl restart docker

再跑一次 sudo docker run hello-world 還是錯誤,但是錯誤碼不同。要用 docker login

去 Docker 註冊一個帳號,跑一次 Docker login

sudo docker run hello-world  -- 再跑一次,終於成攻了


Kuberntes 安裝

添加 憑證 和 repository

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add

sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

安裝K8S 相關套件

sudo apt install kubeadm kubectl kubelet

初始化 Maste

sudo kubeadm init --kubernetes-version=v1.13.4 --pod-network-cidr=10.244.0.0/16 service-cidr=10.96.0.0/12

沒有關閉 SWAP 會出現下面錯誤

執行畫面

neo@u1810:~$ sudo kubeadm init --kubernetes-version=v1.13.4 --pod-network-cidr=10.244.0.0/16 service-cidr=10.96.0.0/12

[init] Using Kubernetes version: v1.13.4

[preflight] Running pre-flight checks

        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06

[preflight] Pulling images required for setting up a Kubernetes cluster

[preflight] This might take a minute or two, depending on the speed of your internet connection

[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'

[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"

[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"

[kubelet-start] Activating the kubelet service

[certs] Using certificateDir folder "/etc/kubernetes/pki"

[certs] Generating "ca" certificate and key

[certs] Generating "apiserver-kubelet-client" certificate and key

[certs] Generating "apiserver" certificate and key

[certs] apiserver serving cert is signed for DNS names [u1810 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.8.53]

[certs] Generating "etcd/ca" certificate and key

[certs] Generating "etcd/server" certificate and key

[certs] etcd/server serving cert is signed for DNS names [u1810 localhost] and IPs [192.168.8.53 127.0.0.1 ::1]

[certs] Generating "etcd/peer" certificate and key

[certs] etcd/peer serving cert is signed for DNS names [u1810 localhost] and IPs [192.168.8.53 127.0.0.1 ::1]

[certs] Generating "etcd/healthcheck-client" certificate and key

[certs] Generating "apiserver-etcd-client" certificate and key

[certs] Generating "front-proxy-ca" certificate and key

[certs] Generating "front-proxy-client" certificate and key

[certs] Generating "sa" key and public key

[kubeconfig] Using kubeconfig folder "/etc/kubernetes"

[kubeconfig] Writing "admin.conf" kubeconfig file

[kubeconfig] Writing "kubelet.conf" kubeconfig file

[kubeconfig] Writing "controller-manager.conf" kubeconfig file

[kubeconfig] Writing "scheduler.conf" kubeconfig file

[control-plane] Using manifest folder "/etc/kubernetes/manifests"

[control-plane] Creating static Pod manifest for "kube-apiserver"

[control-plane] Creating static Pod manifest for "kube-controller-manager"

[control-plane] Creating static Pod manifest for "kube-scheduler"

[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s

[apiclient] All control plane components are healthy after 31.014621 seconds

[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster

[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "u1810" as an annotation

[mark-control-plane] Marking the node u1810 as control-plane by adding the label "node-role.kubernetes.io/master=''"

[mark-control-plane] Marking the node u1810 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

[bootstrap-token] Using token: rnrbe5.tq9bglome3cmceci

[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles

[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials

[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token

[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster

[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace

[addons] Applied essential addon: CoreDNS

[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube

  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.

Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node

as root:  下面這串是 要添加 Node 所需的指令和 taken

kubeadm join 192.168.8.53:6443 --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f


創建用戶配置文件

mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

測試

kubectl get componentstatus

kubectl get nodes


安裝網路 -- flannel

sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml


確認 Master 是否正常

kubectl get nodes

kubectl get pods --all-namespaces


  • 添加 Node 到 K8S Cluster

使用之前 kubeadm init 最後產生的資訊,在要增加新的 Node 上執行

sudo kubeadm join 192.168.8.53:6443 --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f


#token 24小時會過期,若後續還須添加新的node,需產新的 token

kubeadm token create


neo@k8s02:~$ sudo kubeadm join 192.168.8.53:6443 --token rnrbe5.tq9bglome3cmceci --discovery-token-ca-cert-hash sha256:e7db4a5329742758c6a448bced245b1a9f257e17430fac437dda1c889b13af4f

[preflight] Running pre-flight checks

        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06

[discovery] Trying to connect to API Server "192.168.8.53:6443"

[discovery] Created cluster-info discovery client, requesting info from "https://192.168.8.53:6443"

[discovery] Requesting info from "https://192.168.8.53:6443" again to validate TLS against the pinned public key

[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.8.53:6443"

[discovery] Successfully established connection with API Server "192.168.8.53:6443"

[join] Reading configuration from the cluster...

[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace

[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"

[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"

[kubelet-start] Activating the kubelet service

[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...

[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s02" as an annotation

This node has joined the cluster:

* Certificate signing request was sent to apiserver and a response was received.

* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.


在 Master 執行 kubectl get nodes ,可以確認 多出 Node


檢查 cluster 是否健康

kubectl get cs

kubectl cluster-info

kubectl version -- short=true


K8S 安裝 NVIDIA Device Plugin 

參考官網步驟 https://github.com/NVIDIA/k8s-device-plugin


  • Prerequisites

The list of prerequisites for running the NVIDIA device plugin is described below:


#查看目前 NVIDIA 硬體

lspci | grep NVIDIA

#使用官方的NVIDIA驅動程式進行手動安裝

https://www.nvidia.com/Download/

這次使用的顯卡是 2080 Ti


sudo chmod +x NVIDIA-Linux-x86_64-418.43.run

sudo ./NVIDIA-Linux-x86_64-418.43.run -no-x-check -no-nouveau-check -no-opengl-files


#掛載Nvidia驅動

modprobe nvidia

#查看顯示卡資訊

nvidia-smi  


#在有 GPU的 Node Enable the nvidia runtime as your default runtime on your node

#修改 /etc/docker/daemon.json 如下:

{

    "default-runtime": "nvidia",

    "runtimes": {

        "nvidia": {

            "path": "/usr/bin/nvidia-container-runtime",

            "runtimeArgs": []

        }

    }

}


#重啟 docker

sudo systemctl daemon-reload && sudo systemctl restart docker


#Enabling GPU Support in Kubernetes

#在 Master 執行,Enable GPU support

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

#重啟 kubelet

sudo systemctl daemon-reload && sudo systemctl restart kubelet

#確認 GPU Node 是否有 GPU 資源可以分配

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"