Wednesday, November 23, 2022

Synapse Analytics - Create Delta Lake Database or Schema or Table fails with java.lang.IllegalArgumentException: null

Issue - When trying to create a delta lake database or a schema or a delta lake table fails with an error java.lang.IllegalArgumentException: null path . 

Error sample screenshot


For a quick background, I am using a small sized cluster with 3-8 worker nodes and using a synapse notebook to execute the syntax. 

Error: 

The error log goes as below,

Error: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: null path
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111)
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:193)
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:153)
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140)
org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45)
org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:133)

It hints at a path being null. 

Resolution:

The very first thing to check is whether the path you are trying to pass on while creating the table is correct or not, for example see the syntax below,

create TABLE lakedatabasename.tablename
USING DELTA
LOCATION 'abfss://container@datalake.dfs.core.windows.net/Path'

Check if the path exists by using the below syntax in a spark notebook, 

mssparkutils.fs.ls('abfss://container@datalake.dfs.core.windows.net/Path')

and this should list down all the files you have under the path,



If files get listed, it's not an issue with your path and you can rule that out.

The next thing to check is if the user you are using to create delta lake database/table has the permission 'BlobStorageDataContributor', this would be needed. 

For me both the above debug steps were okay, later I read that in synapse, delta lake creates metadata for lake database in the default container, and i had actually given the default container same name as my storage account while creation, so I had deleted the default container. 

This is where the template i extracted during resource creation came in handy for me, I checked the template and found that I had the same name for the 'defaultDataLakeStorageFilesystemName' parameter as well, see screenshot below, 



The fix was to create the default container name back in, so synapse can create the deltalake database metadata inside the same and re-run the same commands again.




The issue was raised by me and documented here in Microsoft Learn community here

Monday, November 21, 2022

Partitioned column data not displaying in External Table over Delta Lake in Serverless SQL Pool

Issue: You are trying to create an external table over a delta lake folder path in your Azure Data Lake. The said delta lake table is paritioned over a few columns. For example, see below where the delta lake table is paritioned over fields, 'TheYear' and 'TheMonth'. 

Delta Lake

The path was then created as an external table in Azure Synapse, and queried upon, where you can see all columns from the delta lake except for the paritioned columns, please see screenshot below,



Cause: This seems to be because this is not a feature, Serverless SQL pool has started to support(reference).

Solution: However there is an alternate way of achieveing the same using a SQL view with OPENROWSET functionality. Then use this view to query the table further.

sample syntax as below, 

    CREATE VIEW vName AS 
    SELECT TOP 10 *
    FROM OPENROWSET(
            BULK 'Gold/Database/Table1/',
            DATA_SOURCE = 'DeltaLakeStorage',
            FORMAT = 'delta'
        )
        WITH (  TheYear VARCHAR(10),
        TheMonth VARCHAR(10),
        type VARCHAR(100),
        count BIGINT
            )