Wednesday, September 15, 2021

Databricks Cluster termination due to Cloud Provider Launch Failure - Allocation Failed

Databricks Cluster Termination due to lack in Allocation of Resources

 

Issue: Restarting a terminated cluster gives the below error message

Cluster terminated.Reason:Cloud Provider Launch Failure

A cloud provider error was encountered while launching worker nodes. See the Databricks guide for more information.

Azure error code: AllocationFailed

Azure error message: Allocation failed. We do not have sufficient capacity for the requested VM size in this region. Read more about improving likelihood of allocation success at http://aka.ms/allocation-guidance


Cause: Allocation of resources failing as Azure cloud doesn't have requested VM size in the region.  This is a region specific error when the requested resources are too much for the region and doesn't have additional resources to get the cluster running.

Resolution:  Microsoft has documented the resolutions here in this article.

Update:- After waiting a while and a couple of restarts later, it started running again for us.