Kubernetes 환경에 Airflow 설치 실습.
설치 경로: /home/confluent/apaches/airflow_k8s
1. Helm Airflow Repo
$ helm repo add apache-airflow <https://airflow.apache.org>
$ helm repo list
NAME URL
apache-airflow <https://airflow.apache.org>
$ helm search repo apache-airflow
NAME CHART VERSION APP VERSION DESCRIPTION
apache-airflow/airflow 1.11.0 2.7.1 The official Helm chart to deploy Apache Airflo...
$ helm pull apache-airflow/airflow
$ tar xzf airflow-1.11.0.tgz
2. Git Sync 설정
git repository 를 통해 airflow dag 파일 관리하기 위한 설정
2-1. SSH Key 생성
ssh로 github과 연겨하기 위한 ssh key를 먼저 생성
$ mkdir .ssh
$ ssh-keygen -t rsa -b 4096 -C "lucia.son.dev@gmail.com"
Generating public/private rsa key pair.
Enter file in which to save the key (/home/confluent/.ssh/id_rsa): /home/confluent/apaches/airflow_k8s/.ssh/id_rsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/confluent/apaches/airflow_k8s/.ssh/id_rsa.
Your public key has been saved in /home/confluent/apaches/airflow_k8s/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Btzac4+wfabkox56FnL0ZpzhnF0KZScD/gVid18HO1g lucia.son.dev@gmail.com
The key's randomart image is:
+---[RSA 4096]----+
| +.o E.o|
| . . o o==.oo|
| o . .o.++ .|
| +. o. ... |
| ..S=.*.o |
| ..o*Xoo |
| oo++ + |
| .o+.+ |
| .+o.o. |
+----[SHA256]-----+
$ ll .ssh
total 8
-rw-------. 1 confluent confluent 3243 Dec 5 13:51 id_rsa
-rw-r--r--. 1 confluent confluent 749 Dec 5 13:51 id_rsa.pub
2-2. Pub Key 를 Github 에 등록
2-3. example.py 등록
2-4. Private Key 관리: Secret 으로
1) airflow 배포용 namespace 생성
$ cat namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: airflow
$ kubectl apply -f namespace.yml
2) secret 생성
$ kubectl create secret generic airflow-ssh-git-secret \\
--from-file=gitSshKey=/home/confluent/apaches/airflow_k8s/.ssh/id_rsa \\
--namespace airflow
secret/airflow-ssh-git-secret created
3) webserver secret key 생성
airflow 의 webserver 는 flask 로 만들어져 있으므로, flask 에서 관리할 session ID 를 위해 secret 토큰 생성
$ python3.8
Python 3.8.9 (default, Nov 13 2023, 17:18:28)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import secrets
>>> print(secrets.token_hex(16))
9dd2f3d0524bd61ba66aaf8d4cb5b959
$ kubectl create secret generic airflow-webserver-secret --from-literal="webserver-secret-key=9dd2f3d0524bd61ba66aaf8d4cb5b959" -n airflow
secret/airflow-webserver-secret created
3. Airflow custom config
values.yaml 을 오버라이드할 수 있게 values-override.yaml 파일을 작성
executor: "KubernetesExecutor"
webserverSecretKey: webserver-secret-key
webserverSecretKeySecretName: airflow-webserver-secret
data:
metadataConnection:
user: young
pass: test123
db: airflow
dags:
gitSync:
enabled: true
repo: git@github.com:lucia-son/airflow-dag.git
branch: main
rev: HEAD
depth: 1
maxFailures: 0
subPath: dags
sshKeySecret: "airflow-ssh-git-secret"
webserver:
service:
type: NodePort
ports:
- name: airflow-ui
port: "{{ .Values.ports.airflowUI }}"
targetPort: "{{ .Values.ports.airflowUI }}"
nodePort: 31080
postgresql:
enabled: true
image:
tag: "11"
auth:
enablePostgresUser: true
postgresPassword: postgres
username: young
password: test123
database: "airflow"
primary:
service:
type: NodePort
nodePorts:
postgresql: 31082
workers:
persistence:
size: 10Gi
storageClassName: local-path
triggerer:
persistence:
size: 10Gi
storageClassName: local-path
🔎 local-path: 동적 provisioning 을 위해 local-path-provisioner 를 사용!
>>> 분명히 local-path 로 지정했고 provisioner가 있음에도 동적provisioning 이 안되는 경우에는, 기존에 존재하고 있던 동일한 이름의 (아마도 삽질의 과정 속에서 생성된) pvc가 있는지 확인하고 기존 pvc를 삭제하고 다시 helm install 을 시도하면 성공!
charts/postgresql/values.yaml 수정을 통해 postgresql 가 사용하는 데이터 디렉토리에 대한 동적 프로비저닝도 설정한다.
global:
storageClass: local-path
4. Helm Install
$ helm install airflow -n airflow -f values-override.yaml ./
NAME: airflow
LAST DEPLOYED: Tue Dec 5 16:25:27 2023
NAMESPACE: airflow
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing Apache Airflow 2.7.1!
Your release is named airflow.
You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser:
Airflow Webserver: kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow
Default Webserver (Airflow UI) Login credentials:
username: admin
password: admin
Default Postgres connection credentials:
username: young
password: test123
port: 5432
You can get Fernet Key value by running the following:
echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode)
WARNING:
Kubernetes workers task logs may not persist unless you configure log persistence or remote logging!
Logging options can be found at: <https://airflow.apache.org/docs/helm-chart/stable/manage-logs.html>
(This warning can be ignored if logging is configured with environment variables or secrets backend)
#####################################################
# WARNING: You should set dags.gitSync.knownHosts #
#####################################################
You are using ssh authentication for your gitsync repo, however you currently have SSH known_hosts verification disabled,
making you susceptible to man-in-the-middle attacks!
Information on how to set knownHosts can be found here:
<https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#knownhosts>
webserver UI 접속 후 git repo에 있는 dag 파일 가져오는지 확인!
'자기발전소 > # Apaches' 카테고리의 다른 글
[Flink] Apache Flink Pod 배포 (Kubernetes) (1) | 2023.11.03 |
---|---|
Install Apache Airflow on Docker (0) | 2023.11.01 |
Airflow 설치 및 구성 테스트 (0) | 2023.10.26 |