Home Big Data Introducing shared VPC assist on Amazon MWAA

Introducing shared VPC assist on Amazon MWAA

0
Introducing shared VPC assist on Amazon MWAA

[ad_1]

On this publish, we show automating deployment of Amazon Managed Workflows for Apache Airflow (Amazon MWAA) utilizing customer-managed endpoints in a VPC, offering compatibility with shared, or in any other case restricted, VPCs.

Information scientists and engineers have made Apache Airflow a number one open supply software to create knowledge pipelines attributable to its lively open supply group, acquainted Python growth as Directed Acyclic Graph (DAG) workflows, and intensive library of pre-built integrations. Amazon MWAA is a managed service for Airflow that makes it straightforward to run Airflow on AWS with out the operational burden of getting to handle the underlying infrastructure. For every Airflow setting, Amazon MWAA creates a single-tenant service VPC, which hosts the metadatabase that shops states and the online server that gives the consumer interface. Amazon MWAA additional manages Airflow scheduler and employee cases in a customer-owned and managed VPC, as a way to schedule and run duties that work together with buyer sources. These Airflow containers within the buyer VPC entry sources within the service VPC by way of a VPC endpoint.

Many organizations select to centrally handle their VPC utilizing AWS Organizations, permitting a VPC in an proprietor account to be shared with sources in a distinct participant account. Nonetheless, as a result of creating a brand new route outdoors of a VPC is taken into account a privileged operation, participant accounts can’t create endpoints in proprietor VPCs. Moreover, many purchasers don’t wish to lengthen the safety privileges required to create VPC endpoints to all customers provisioning Amazon MWAA environments. Along with VPC endpoints, prospects additionally want to prohibit knowledge egress by way of Amazon Easy Queue Service (Amazon SQS) queues, and Amazon SQS entry is a requirement within the Amazon MWAA structure.

Shared VPC assist for Amazon MWAA provides the flexibility so that you can handle your personal endpoints inside your VPCs, including compatibility to shared and in any other case restricted VPCs. Specifying customer-managed endpoints additionally offers the flexibility to satisfy strict safety insurance policies by explicitly proscribing VPC useful resource entry to simply these wanted by your Amazon MWAA environments. This publish demonstrates how customer-managed endpoints work with Amazon MWAA and offers examples of how one can automate the provisioning of these endpoints.

Resolution overview

Shared VPC assist for Amazon MWAA permits a number of AWS accounts to create their Airflow environments into shared, centrally managed VPCs. The account that owns the VPC (proprietor) shares the 2 non-public subnets required by Amazon MWAA with different accounts (individuals) that belong to the identical group from AWS Organizations. After the subnets are shared, the individuals can view, create, modify, and delete Amazon MWAA environments within the subnets shared with them.

When customers specify the necessity for a shared, or in any other case policy-restricted, VPC throughout setting creation, Amazon MWAA will first create the service VPC sources, then enter a pending state for as much as 72 hours, with an Amazon EventBridge notification of the change in state. This enables homeowners to create the required endpoints on behalf of individuals based mostly on endpoint service info from the Amazon MWAA console or API, or programmatically by way of an AWS Lambda operate and EventBridge rule, as within the instance on this publish.

After these endpoints are created on the proprietor account, the endpoint service within the single-tenant Amazon MWAA VPC will detect the endpoint connection occasion and resume setting creation. Ought to there be a difficulty, you may cancel setting creation by deleting the setting throughout this pending state.

This characteristic additionally lets you take away the create, modify, and delete VPCE privileges from the AWS Id and Entry Administration (IAM) principal creating Amazon MWAA environments, even when not utilizing a shared VPC, as a result of that permission will as a substitute be imposed on the IAM principal creating the endpoint (the Lambda operate in our instance). Moreover, the Amazon MWAA setting will present the SQS queue Amazon Useful resource Title (ARN) utilized by the Airflow Celery Executor to queue duties (the Celery Executor Queue), permitting you to explicitly enter these sources into your community coverage slightly than having to supply a extra open and generalized permission.

On this instance, we create the VPC and Amazon MWAA setting in the identical account. For shared VPCs throughout accounts, the EventBridge rule and Lambda operate would exist within the proprietor account, and the Amazon MWAA setting could be created within the participant account. See Sending and receiving Amazon EventBridge occasions between AWS accounts for extra info.

Conditions

It is best to have the next conditions:

  • An AWS account
  • An AWS consumer in that account, with permissions to create VPCs, VPC endpoints, and Amazon MWAA environments
  • An Amazon Easy Storage Service (Amazon S3) bucket in that account, with a folder referred to as dags

Create the VPC

We start by making a restrictive VPC utilizing an AWS CloudFormation template, as a way to simulate creating the required VPC endpoint and modifying the SQS endpoint coverage. If you wish to use an current VPC, you may proceed to the following part.

  1. Obtain the CloudFormation template referenced in Possibility three: Creating an Amazon VPC community with out Web entry.
  2. Extract the file cfn-vpc-private-bjs.yml from the downloaded ZIP archive.
  3. Now we edit our CloudFormation template to limit entry to Amazon SQS. In cfn-vpc-private-bjs.yml, edit the SqsVpcEndoint part to seem as follows:
   SqsVpcEndoint:
     Kind: AWS::EC2::VPCEndpoint
     Properties:
       ServiceName: !Sub "com.amazonaws.${AWS::Area}.sqs"
       VpcEndpointType: Interface
       VpcId: !Ref VPC
       PrivateDnsEnabled: true
       SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
       SecurityGroupIds:
        - !Ref SecurityGroup
       PolicyDocument:
        Assertion:
         - Impact: Enable
           Principal: '*'
           Motion: '*'
           Useful resource: []

This extra coverage doc entry prevents Amazon SQS egress to any useful resource not explicitly listed.

Now we are able to create our CloudFormation stack.

  1. On the AWS CloudFormation console, select Create stack.
  2. Choose Add a template file.
  3. Select Select file.
  4. Browse to the file you modified.
  5. Select Subsequent.
  6. For Stack identify, enter MWAA-Atmosphere-VPC.
  7. Select Subsequent till you attain the evaluation web page.
  8. Select Submit.

Create the Lambda operate

Now we have two choices for self-managing our endpoints: guide and automatic. On this instance, we create a Lambda operate that responds to the Amazon MWAA EventBridge notification. You might additionally use the EventBridge notification to ship an Amazon Easy Notification Service (Amazon SNS) message, corresponding to an e-mail, to somebody with permission to create the VPC endpoint manually.

First, we create a Lambda operate to answer the EventBridge occasion that Amazon MWAA will emit.

  1. On the Lambda console, select Create operate.
  2. For Title, enter mwaa-create-lambda.
  3. For Runtime, select Python 3.11.
  4. Select Create operate.
  5. For Code, within the Code supply part, for lambda_function, enter the next code:
    import boto3
    import json
    import logging
    
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def lambda_handler(occasion, context):
        if occasion['detail']['status']=="PENDING":
            element=occasion['detail']
            identify=element['name']
            celeryExecutorQueue=element['celeryExecutorQueue']
            subnetIds=element['networkConfiguration']['subnetIds']
            securityGroupIds=element['networkConfiguration']['securityGroupIds']
            databaseVpcEndpointService=element['databaseVpcEndpointService']
    
            # MWAA doesn't must retailer the VPC ID, however we are able to get it from the subnets
            shopper = boto3.shopper('ec2')
            response = shopper.describe_subnets(SubnetIds=subnetIds)
            logger.information(response['Subnets'][0]['VpcId'])  
            vpcId=response['Subnets'][0]['VpcId']
            logger.information("vpcId: " + vpcId)       
            
            webserverVpcEndpointService=None
            if element['webserverAccessMode']=="PRIVATE_ONLY":
                webserverVpcEndpointService=occasion['detail']['webserverVpcEndpointService']
            
            response = shopper.describe_vpc_endpoints(
                VpcEndpointIds=[],
                Filters=[
                    {"Name": "vpc-id", "Values": [vpcId]},
                    {"Title": "service-name", "Values": ["*.sqs"]},
                    ],
                MaxResults=1000
            )
            sqsVpcEndpoint=None
            for r in response['VpcEndpoints']:
                if subnetIds[0] in r['SubnetIds'] or subnetIds[0] in r['SubnetIds']:
                    # We're filtering describe by service identify, so this have to be SQS
                    sqsVpcEndpoint=r
                    break
            
            if sqsVpcEndpoint:
                logger.information("Discovered SQS endpoint: " + sqsVpcEndpoint['VpcEndpointId'])
    
                logger.information(sqsVpcEndpoint)
                pd = json.masses(sqsVpcEndpoint['PolicyDocument'])
                for s in pd['Statement']:
                    if s['Effect']=='Enable':
                        useful resource = s['Resource']
                        logger.information(useful resource)
                        if '*' in useful resource:
                            logger.information("'*' already allowed")
                        elif celeryExecutorQueue in useful resource: 
                            logger.information("'"+celeryExecutorQueue+"' already allowed")                
                        else:
                            s['Resource'].append(celeryExecutorQueue)
                            logger.information("Updating SQS coverage to " + str(pd))
            
                            shopper.modify_vpc_endpoint(
                                VpcEndpointId=sqsVpcEndpoint['VpcEndpointId'],
                                PolicyDocument=json.dumps(pd)
                                )
                        break
            
            # create MWAA database endpoint
            logger.information("creating endpoint to " + databaseVpcEndpointService)
            endpointName=identify+"-database"
            response = shopper.create_vpc_endpoint(
                VpcEndpointType="Interface",
                VpcId=vpcId,
                ServiceName=databaseVpcEndpointService,
                SubnetIds=subnetIds,
                SecurityGroupIds=securityGroupIds,
                TagSpecifications=[
                    {
                        "ResourceType": "vpc-endpoint",
                        "Tags": [
                            {
                                "Key": "Name",
                                "Value": endpointName
                            },
                        ]
                    },
                ],           
            )
            logger.information("created VPCE: " + response['VpcEndpoint']['VpcEndpointId'])
                
            # create MWAA net server endpoint (if non-public)
            if webserverVpcEndpointService:
                endpointName=identify+"-webserver"
                logger.information("creating endpoint to " + webserverVpcEndpointService)
                response = shopper.create_vpc_endpoint(
                    VpcEndpointType="Interface",
                    VpcId=vpcId,
                    ServiceName=webserverVpcEndpointService,
                    SubnetIds=subnetIds,
                    SecurityGroupIds=securityGroupIds,
                    TagSpecifications=[
                        {
                            "ResourceType": "vpc-endpoint",
                            "Tags": [
                                {
                                    "Key": "Name",
                                    "Value": endpointName
                                },
                            ]
                        },
                    ],                  
                )
                logger.information("created VPCE: " + response['VpcEndpoint']['VpcEndpointId'])
    
        return {
            'statusCode': 200,
            'physique': json.dumps(occasion['detail']['status'])
        }

  6. Select Deploy.
  7. On the Configuration tab of the Lambda operate, within the Basic configuration part, select Edit.
  8. For Timeout, increate to five minutes, 0 seconds.
  9. Select Save.
  10. Within the Permissions part, underneath Execution position, select the position identify to edit the permissions of this operate.
  11. For Permission insurance policies, select the hyperlink underneath Coverage identify.
  12. Select Edit and add a comma and the next assertion:
    {
    		"Sid": "Statement1",
    		"Impact": "Enable",
    		"Motion": 
    		[
    			"ec2:DescribeVpcEndpoints",
    			"ec2:CreateVpcEndpoint",
    			"ec2:ModifyVpcEndpoint",
                "ec2:DescribeSubnets",
    			"ec2:CreateTags"
    		],
    		"Useful resource": 
    		[
    			"*"
    		]
    }

The whole coverage ought to look just like the next:

{
	"Model": "2012-10-17",
	"Assertion": [
		{
			"Effect": "Allow",
			"Action": "logs:CreateLogGroup",
			"Resource": "arn:aws:logs:us-east-1:112233445566:*"
		},
		{
			"Effect": "Allow",
			"Action": [
				"logs:CreateLogStream",
				"logs:PutLogEvents"
			],
			"Useful resource": [
				"arn:aws:logs:us-east-1:112233445566:log-group:/aws/lambda/mwaa-create-lambda:*"
			]
		},
		{
			"Sid": "Statement1",
			"Impact": "Enable",
			"Motion": [
				"ec2:DescribeVpcEndpoints",
				"ec2:CreateVpcEndpoint",
				"ec2:ModifyVpcEndpoint",
               	"ec2:DescribeSubnets",
				"ec2:CreateTags"
			],
			"Useful resource": [
				"*"
			]
		}
	]
}

  1. Select Subsequent till you attain the evaluation web page.
  2. Select Save modifications.

Create an EventBridge rule

Subsequent, we configure EventBridge to ship the Amazon MWAA notifications to our Lambda operate.

  1. On the EventBridge console, select Create rule.
  2. For Title, enter mwaa-create.
  3. Choose Rule with an occasion sample.
  4. Select Subsequent.
  5. For Creation technique, select Consumer sample kind.
  6. Select Edit sample.
  7. For Occasion sample, enter the next:
    {
      "supply": ["aws.airflow"],
      "detail-type": ["MWAA Environment Status Change"]
    }

  8. Select Subsequent.
  9. For Choose a goal, select Lambda operate.

You may additionally specify an SNS notification as a way to obtain a message when the setting state changes.

  1. For Perform, select mwaa-create-lambda.
  2. Select Subsequent till you attain the ultimate part, then select Create rule.

Create an Amazon MWAA setting

Lastly, we create an Amazon MWAA setting with customer-managed endpoints.

  1. On the Amazon MWAA console, select Create setting.
  2. For Title, enter a novel identify on your setting.
  3. For Airflow model, select the newest Airflow model.
  4. For S3 bucket, select Browse S3 and select your S3 bucket, or enter the Amazon S3 URI.
  5. For DAGs folder, select Browse S3 and select the dags/ folder in your S3 bucket, or enter the Amazon S3 URI.
  6. Select Subsequent.
  7. For Digital Non-public Cloud, select the VPC you created earlier.
  8. For Net server entry, select Public community (Web accessible).
  9. For Safety teams, deselect Create new safety group.
  10. Select the shared VPC safety group created by the CloudFormation template.

As a result of the safety teams of the AWS PrivateLink endpoints from the sooner step are self-referencing, you will need to select the identical safety group on your Amazon MWAA setting.

  1. For Endpoint administration, select Buyer managed endpoints.
  2. Maintain the remaining settings as default and select Subsequent.
  3. Select Create setting.

When your setting is out there, you may entry it by way of the Open Airflow UI hyperlink on the Amazon MWAA console.

Clear up

Cleansing up sources that aren’t actively getting used reduces prices and is a finest follow. If you happen to don’t delete your sources, you may incur extra expenses. To wash up your sources, full the next steps:

  1. Delete your Amazon MWAA setting, EventBridge rule, and Lambda operate.
  2. Delete the VPC endpoints created by the Lambda operate.
  3. Delete any safety teams created, if relevant.
  4. After the above sources have accomplished deletion, delete the CloudFormation stack to make sure that you may have eliminated all the remaining sources.

Abstract

This publish described how one can automate setting creation with shared VPC assist in Amazon MWAA. This provides you the flexibility to handle your personal endpoints inside your VPC, including compatibility to shared, or in any other case restricted, VPCs. Specifying customer-managed endpoints additionally offers the flexibility to satisfy strict safety insurance policies by explicitly proscribing VPC useful resource entry to simply these wanted by their Amazon MWAA environments. To be taught extra about Amazon MWAA, discuss with the Amazon MWAA Consumer Information. For extra posts about Amazon MWAA, go to the Amazon MWAA sources web page.


Concerning the writer

John Jackson has over 25 years of software program expertise as a developer, methods architect, and product supervisor in each startups and enormous firms and is the AWS Principal Product Supervisor liable for Amazon MWAA.

[ad_2]

Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here