Wednesday, February 7, 2024

Working with com.crealytics.spark.excel package for excel files in Azure Synapse

This is a post to help atleast some of you who is trying to get the com.crealytics.spark.excel package up and running in your synapse workspace and on your spark pool. I will try to explain it in the most simplest of steps. 

Step 1 - Go to MVN repository and download the latest jar file for the crealytics excel spark package.



Step 2 - Once the file is downloaded go to your Synapse workspace and to the Manage tab, then to the Workspace packages tab


Step 3 - Upload jar file to workspace packages and it should up  on the list with provisioning status as succeded, see below.



Step 4 - Once the package is uploaded, go to Manage > Sparkpool > Packages and select the spark-excel_2.12-3.5.0_0.20.3.jar from the list. Important that session level packages are allowed and the spark pool is restarted after this step. See screenshots below,


That's it on the configuration side, now on your notebook, you could have a code snippet like below to read from an excel file.

df = spark.read.format("com.crealytics.spark.excel").option("header", "true")
/ .option("inferSchema", "true").load(ReadPath)

where ReadPath contains the path to the excel in your datalake. You can play around with more options on this piece of code. Hope this helps, please let us know in comments.


Note:- If you have higher environments, make sure you repeat the steps there. 

No comments:

Post a Comment