Thursday, August 31, 2023

ADF / Synapse - Error converting Excel to Parquet

Error in Synapse / Azure Data Factory

Aim - Build a pipeline that can take an excel as input in a file contianer and write into a parquet file. 

Issue:- When processing the file, the copy activity runs into an error saying one of the fields cannot have more than one value. 

"errorCode": "2200",

    "message": "ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.IllegalStateException:field 42 (COUNTRY_NAME) can not have more than one value: [Ägypten]\ntotal entry:6\r\norg.apache.parquet.example.data.simple.SimpleGroup.add(SimpleGroup.java:106)\r\norg.apache.parquet.example.data.simple.SimpleGroup.add(SimpleGroup.java:196)\r\norg.apache.parquet.example.data.Group.add(Group.java:59)\r\norg.apache.parquet.example.data.Group.append(Group.java:137)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetBatchWriter.readRow(ParquetBatchWriter.java:112)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetBatchWriter.addRows(ParquetBatchWriter.java:60)\r\n.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",

    "failureType": "UserError",

    "target": "Copy Bronze - Excel to Parquet",


Cause:- This has to with the excel file having two or more columns with the same name. We had the field "COUNTRY_NAME" twice in the excel and this was causing the issue. 

Fix: Remove the duplicated field and re-process the file again.