Only in Mage Pro.Try our fully managed solution to access this advanced feature.
How to Use PySpark in Mage Pro
Follow these steps to run PySpark code in Mage Pro:- Create a batch pipeline in the Mage UI.
- Add a block of type:
Data Loader
,Transformer
,Data Exporter
, orCustom
. - In your block, write PySpark code using the provided
SparkSession
. - Install or mount any required Spark JARs, such as those for Iceberg or cloud storage access.
Example Pipeline
Create a standard batch pipeline and configure the following settings in the pipeline’smetadata.yaml
file to ensure PySpark works properly:
Data Loader Block (PySpark)
Data Exporter Block
Benefits of Running PySpark in Mage Pro
Mage Pro handles all the infrastructure so you can focus on your PySpark code:- ⚙️ Distributed execution with automatic pod scheduling and resource allocation
- ☁️ Seamless cloud integration with GCS, S3, and service account/IAM-based authentication
- 🧩 Support for Spark JARs and connectors like Apache Iceberg, GCS connectors, Delta Lake, and more
- 📈 Built-in observability, with access to logs, resource usage, and block-level monitoring in the Mage UI
Notes
- You can customize the
SparkSession
in any block using.builder.config(...)
to tune performance or integrate external tools. - Cloud storage credentials (e.g., a GCP service account key or AWS credentials) must be mounted and accessible inside the Mage Pro cluster.
- For advanced use cases (e.g., Apache Iceberg), see the Iceberg + PySpark guide.