Error in Synapse / Azure Data Factory |
Aim - Build a pipeline that can take an excel as input in a file contianer and write into a parquet file.
Issue:- When processing the file, the copy activity runs into an error saying one of the fields cannot have more than one value.
"errorCode": "2200",
"message": "ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.IllegalStateException:field 42 (COUNTRY_NAME) can not have more than one value: [Ägypten]\ntotal entry:6\r\norg.apache.parquet.example.data.simple.SimpleGroup.add(SimpleGroup.java:106)\r\norg.apache.parquet.example.data.simple.SimpleGroup.add(SimpleGroup.java:196)\r\norg.apache.parquet.example.data.Group.add(Group.java:59)\r\norg.apache.parquet.example.data.Group.append(Group.java:137)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetBatchWriter.readRow(ParquetBatchWriter.java:112)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetBatchWriter.addRows(ParquetBatchWriter.java:60)\r\n.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"failureType": "UserError",
"target": "Copy Bronze - Excel to Parquet",
Cause:- This has to with the excel file having two or more columns with the same name. We had the field "COUNTRY_NAME" twice in the excel and this was causing the issue.
Fix: Remove the duplicated field and re-process the file again.