Showing posts with label Delta Lake. Show all posts
Showing posts with label Delta Lake. Show all posts

Wednesday, January 18, 2023

Create Power BI user for Synapse Serverless SQL Pool to access External Delta Lake Tables

Problem Statement: You are trying to create a seperate user for reading datasets in Power BI. 

Issue: After creating the user login, you are facing this error while trying to read your external tables or view

Msg 13807, Level 16, State 1, Procedure Dim_ABC, Line 2 [Batch Start Line 0]
Content of directory on path 'https://data.dfs.core.windows.net/data/Gold/Dim_ABC/_delta_log/*.*' cannot be listed.
Msg 4413, Level 16, State 1, Line 1
Could not use view or function 'dbo.Dim_ABC' because of binding errors.


Solution: 

Step 1: Create a Master Key

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '*******'

Step 2: Create a Database Scoped Credential with Managed Identity

CREATE DATABASE SCOPED CREDENTIAL [PowerBICredential]

WITH IDENTITY = 'Managed Identity' 

Step 3: Create a Power BI login

CREATE LOGIN PowerBI_RO WITH PASSWORD = '*******';

GO

CREATE USER PowerBI_RO FROM LOGIN PowerBI_RO

GO

ALTER ROLE db_datareader ADD  MEMBER PowerBI_RO;


Step 4 -Grant Access to Power BI Scoped Credential to Power BI login

GRANT CONTROL TO PowerBI_RO

GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[PowerBICredential] TO PowerBI_RO


Step 5 - Create an External Data Source associated to this credential

CREATE EXTERNAL DATA SOURCE [DeltaLakeStorage]

    WITH (

        LOCATION = 'abfss://containername@datalakename.dfs.core.windows.net/'

        ,        CREDENTIAL= [PowerBICredential] 

    )


Step 6: Login and run a select statement and you should be able to use this to view delta lake tables.


Update(26th January): Microsoft Synapse Analytics youtube page has recently published a video tutorial on how to do this. - https://www.youtube.com/watch?v=_OZ7XUapJZs 

Wednesday, November 23, 2022

Synapse Analytics - Create Delta Lake Database or Schema or Table fails with java.lang.IllegalArgumentException: null

Issue - When trying to create a delta lake database or a schema or a delta lake table fails with an error java.lang.IllegalArgumentException: null path . 

Error sample screenshot


For a quick background, I am using a small sized cluster with 3-8 worker nodes and using a synapse notebook to execute the syntax. 

Error: 

The error log goes as below,

Error: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: null path
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111)
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:193)
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:153)
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140)
org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45)
org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:133)

It hints at a path being null. 

Resolution:

The very first thing to check is whether the path you are trying to pass on while creating the table is correct or not, for example see the syntax below,

create TABLE lakedatabasename.tablename
USING DELTA
LOCATION 'abfss://container@datalake.dfs.core.windows.net/Path'

Check if the path exists by using the below syntax in a spark notebook, 

mssparkutils.fs.ls('abfss://container@datalake.dfs.core.windows.net/Path')

and this should list down all the files you have under the path,



If files get listed, it's not an issue with your path and you can rule that out.

The next thing to check is if the user you are using to create delta lake database/table has the permission 'BlobStorageDataContributor', this would be needed. 

For me both the above debug steps were okay, later I read that in synapse, delta lake creates metadata for lake database in the default container, and i had actually given the default container same name as my storage account while creation, so I had deleted the default container. 

This is where the template i extracted during resource creation came in handy for me, I checked the template and found that I had the same name for the 'defaultDataLakeStorageFilesystemName' parameter as well, see screenshot below, 



The fix was to create the default container name back in, so synapse can create the deltalake database metadata inside the same and re-run the same commands again.




The issue was raised by me and documented here in Microsoft Learn community here