Data & AI
Databricks, Data Lake Store, Scala
The customer is an American multinational computer software company in the content creation product development space.
The customer had embarked on a multiyear initiative focused on moving their on-prem Cloudera Hadoop instance to Azure.
The existing customer scoring and cross sell recommendation solutions built on Hadoop Map Reduce engine and Hive Queries (HQL) had following challenges.
WinWire, in collaboration with the customer has envisioned to address the existing challenges by adopting a Spark based architecture with Azure Databricks.
WinWire team did an end-to-end assessment of the existing MapReduce code, Hive queries and designed a modern data pipeline to Spark code seamlessly. This transition enabled the customer to process data faster and improved the overall performance of the job by reducing the executing time by more than 50%.
We migrated Hive HQL/MapReduce to Spark/Scala code based on design pattern analysis automation. Data between the old and new process was validated as part of integration testing at a target table level. Metadata validation was also performed like the data type of all columns and number of columns.