Running Apache Airflow as a Specific User: A Security Best Practice

In the world of data engineering, security is paramount. One common best practice is to run services like Apache Airflow under their own non-privileged system user. This strategy helps to improve system security by limiting the potential impact of any security vulnerabilities that might be present in the service.

Why Run Airflow as a Specific User?

Running Airflow as a specific user, separate from the user who is developing the code, provides several benefits:

How to Implement This Strategy

Here’s a step-by-step guide on how to implement this strategy:

Connecting MongoDB to MariaDB using Apache Airflow

Now, let’s discuss how to connect MongoDB to MariaDB using Apache Airflow. Here’s a basic script that reads data from MongoDB and writes it to a MariaDB table:

Python code:

from pymongo import MongoClient

import mariadb

from datetime import datetime

# MongoDB connection

client = MongoClient('mongodb://localhost:27017/')

db = client['your_database']

collection = db['your_collection']

# MariaDB connection

conn = mariadb.connect(user='airflow', password='password', database='your_database')

cur = conn.cursor()

# Create table if not exists


    CREATE TABLE IF NOT EXISTS dim_linkedin_post (

        post_url VARCHAR(255),

        post_id BIGINT,

        post_date DATE,

        ds DATE,

        PRIMARY KEY (post_id, ds)


        PARTITION p0 VALUES LESS THAN ('2024-01-01'),

        PARTITION p1 VALUES LESS THAN ('2024-02-01'),

        PARTITION p2 VALUES LESS THAN ('2024-03-01')




# Process MongoDB documents

docs = collection.find({})

for doc in docs:

    # Extract attributes

    post_url = doc['postUrl']

    post_id = doc['postId']

    post_date = doc['postDate']

    ds =  # Current date

    # Insert or update in MariaDB


        INSERT INTO dim_linkedin_post (post_url, post_id, post_date, ds)

        VALUES (%s, %s, %s, %s)


        post_url = VALUES(post_url),

        post_date = VALUES(post_date),

        ds = VALUES(ds)

    """, (post_url, post_id, post_date, ds))


This script reads documents from a MongoDB collection, extracts the post_url, post_id, post_date, and ds attributes, and inserts them into a MariaDB table called dim_linkedin_post. The ds field is used for partitioning the data by date. If a record with the same post_id and ds already exists, it updates the existing record.

Remember, security is a journey, not a destination. Always stay informed about the latest best practices and continually review and update your security measures as necessary. Happy coding! 😊