본문 바로가기
  • 노션에서 삽질한 내용을 정리하는 블로그
자기발전소/# Apaches

Install Airflow on Kubernetes

by iamlucia 2023. 11. 26.

 

 

 

 

Kubernetes 환경에 Airflow 설치 실습.

 

 설치 경로: /home/confluent/apaches/airflow_k8s

 

1. Helm Airflow Repo

$ helm repo add apache-airflow <https://airflow.apache.org>

$ helm repo list
NAME            URL
apache-airflow  <https://airflow.apache.org>

$ helm search repo apache-airflow
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
apache-airflow/airflow  1.11.0          2.7.1           The official Helm chart to deploy Apache Airflo...

$ helm pull apache-airflow/airflow

$ tar xzf airflow-1.11.0.tgz

2. Git Sync 설정

git repository 를 통해 airflow dag 파일 관리하기 위한 설정

2-1. SSH Key 생성

ssh로 github과 연겨하기 위한 ssh key를 먼저 생성

$ mkdir .ssh

$ ssh-keygen -t rsa -b 4096 -C "lucia.son.dev@gmail.com"
Generating public/private rsa key pair.
Enter file in which to save the key (/home/confluent/.ssh/id_rsa): /home/confluent/apaches/airflow_k8s/.ssh/id_rsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/confluent/apaches/airflow_k8s/.ssh/id_rsa.
Your public key has been saved in /home/confluent/apaches/airflow_k8s/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Btzac4+wfabkox56FnL0ZpzhnF0KZScD/gVid18HO1g lucia.son.dev@gmail.com
The key's randomart image is:
+---[RSA 4096]----+
|          +.o E.o|
|     . . o o==.oo|
|      o . .o.++ .|
|       +. o. ... |
|      ..S=.*.o   |
|      ..o*Xoo    |
|       oo++ +    |
|       .o+.+     |
|      .+o.o.     |
+----[SHA256]-----+

$ ll .ssh 

total 8
-rw-------. 1 confluent confluent 3243 Dec  5 13:51 id_rsa
-rw-r--r--. 1 confluent confluent  749 Dec  5 13:51 id_rsa.pub

2-2. Pub Key 를 Github 에 등록

 

 

2-3. example.py 등록

2-4. Private Key 관리: Secret 으로

1) airflow 배포용 namespace 생성

$ cat namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: airflow

$ kubectl apply -f namespace.yml

 

2) secret 생성

$ kubectl create secret generic airflow-ssh-git-secret \\
	 --from-file=gitSshKey=/home/confluent/apaches/airflow_k8s/.ssh/id_rsa \\ 
   --namespace airflow
secret/airflow-ssh-git-secret created

 

3) webserver secret key 생성

airflow 의 webserver 는 flask 로 만들어져 있으므로, flask 에서 관리할 session ID 를 위해 secret 토큰 생성

$ python3.8
Python 3.8.9 (default, Nov 13 2023, 17:18:28)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import secrets
>>> print(secrets.token_hex(16))
9dd2f3d0524bd61ba66aaf8d4cb5b959

$ kubectl create secret generic airflow-webserver-secret --from-literal="webserver-secret-key=9dd2f3d0524bd61ba66aaf8d4cb5b959" -n airflow
secret/airflow-webserver-secret created

3. Airflow custom config

values.yaml 을 오버라이드할 수 있게 values-override.yaml 파일을 작성

executor: "KubernetesExecutor"
webserverSecretKey: webserver-secret-key
webserverSecretKeySecretName: airflow-webserver-secret
data:
  metadataConnection:
    user: young
    pass: test123
    db: airflow
dags:
  gitSync:
    enabled: true
    repo: git@github.com:lucia-son/airflow-dag.git
    branch: main
    rev: HEAD
    depth: 1
    maxFailures: 0
    subPath: dags
    sshKeySecret: "airflow-ssh-git-secret"
webserver:
  service:
    type: NodePort
    ports:
      - name: airflow-ui
        port: "{{ .Values.ports.airflowUI }}"
        targetPort: "{{ .Values.ports.airflowUI }}"
        nodePort: 31080
postgresql:
  enabled: true
  image:
    tag: "11"
  auth:
    enablePostgresUser: true
    postgresPassword: postgres
    username: young
    password: test123
    database: "airflow"
  primary:
    service:
      type: NodePort
      nodePorts:
        postgresql: 31082
workers:
  persistence:
    size: 10Gi
    storageClassName: local-path
triggerer:
  persistence:
    size: 10Gi
    storageClassName: local-path

 

 

🔎  local-path: 동적 provisioning 을 위해 local-path-provisioner 를 사용! 

>>> 분명히 local-path 로 지정했고 provisioner가 있음에도 동적provisioning 이 안되는 경우에는, 기존에 존재하고 있던 동일한 이름의 (아마도 삽질의 과정 속에서 생성된) pvc가 있는지 확인하고 기존 pvc를 삭제하고 다시 helm install 을 시도하면 성공!

 

 

 

charts/postgresql/values.yaml 수정을 통해 postgresql 가 사용하는 데이터 디렉토리에 대한 동적 프로비저닝도 설정한다.

global: 
  storageClass: local-path 

4. Helm Install

$ helm install airflow -n airflow -f values-override.yaml ./

NAME: airflow
LAST DEPLOYED: Tue Dec  5 16:25:27 2023
NAMESPACE: airflow
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing Apache Airflow 2.7.1!

Your release is named airflow.
You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser:

Airflow Webserver:     kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow
Default Webserver (Airflow UI) Login credentials:
    username: admin
    password: admin
Default Postgres connection credentials:
    username: young
    password: test123
    port: 5432

You can get Fernet Key value by running the following:

    echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode)

WARNING:
    Kubernetes workers task logs may not persist unless you configure log persistence or remote logging!
    Logging options can be found at: <https://airflow.apache.org/docs/helm-chart/stable/manage-logs.html>
    (This warning can be ignored if logging is configured with environment variables or secrets backend)

#####################################################
#  WARNING: You should set dags.gitSync.knownHosts  #
#####################################################

You are using ssh authentication for your gitsync repo, however you currently have SSH known_hosts verification disabled,
making you susceptible to man-in-the-middle attacks!

Information on how to set knownHosts can be found here:
<https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#knownhosts>

 

 

webserver UI 접속 후 git repo에 있는 dag 파일 가져오는지 확인!

 

 

'자기발전소 > # Apaches' 카테고리의 다른 글

[Flink] Apache Flink Pod 배포 (Kubernetes)  (1) 2023.11.03
Install Apache Airflow on Docker  (0) 2023.11.01
Airflow 설치 및 구성 테스트  (0) 2023.10.26