Error sample screenshot |
For a quick background, I am using a small sized cluster with 3-8 worker nodes and using a synapse notebook to execute the syntax.
Error:
The error log goes as below,
Error: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: null path
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111)
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:193)
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:153)
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140)
org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45)
org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:133)
It hints at a path being null.
Resolution:
The very first thing to check is whether the path you are trying to pass on while creating the table is correct or not, for example see the syntax below,
create TABLE lakedatabasename.tablename
USING DELTA
LOCATION 'abfss://container@datalake.dfs.core.windows.net/Path'
USING DELTA
LOCATION 'abfss://container@datalake.dfs.core.windows.net/Path'
Check if the path exists by using the below syntax in a spark notebook,
mssparkutils.fs.ls('abfss://container@datalake.dfs.core.windows.net/Path')
and this should list down all the files you have under the path,
The next thing to check is if the user you are using to create delta lake database/table has the permission 'BlobStorageDataContributor', this would be needed.
For me both the above debug steps were okay, later I read that in synapse, delta lake creates metadata for lake database in the default container, and i had actually given the default container same name as my storage account while creation, so I had deleted the default container.
This is where the template i extracted during resource creation came in handy for me, I checked the template and found that I had the same name for the 'defaultDataLakeStorageFilesystemName' parameter as well, see screenshot below,
The fix was to create the default container name back in, so synapse can create the deltalake database metadata inside the same and re-run the same commands again.
The issue was raised by me and documented here in Microsoft Learn community here.
No comments:
Post a Comment