Data & AI
Databricks, Data Lake Store, Scala
The customer is an American multinational computer software company in the content creation product development space.
The customer had embarked on a multiyear initiative focused on moving their on-prem Cloudera Hadoop instance to Azure.
The existing customer scoring and cross sell recommendation solutions built on Hadoop Map Reduce engine and Hive Queries (HQL) had following challenges.
- Delay in the recommendations reaching the sales team
- Infrastructure limited the scope of data to be evaluated for the scoring
- Difficult to manage the scoring parameter change requests from the business
WinWire, in collaboration with the customer has envisioned to address the existing challenges by adopting a Spark based architecture with Azure Databricks.
WinWire team did an end-to-end assessment of the existing MapReduce code, Hive queries and designed a modern data pipeline to Spark code seamlessly. This transition enabled the customer to process data faster and improved the overall performance of the job by reducing the executing time by more than 50%.
We migrated Hive HQL/MapReduce to Spark/Scala code based on design pattern analysis automation. Data between the old and new process was validated as part of integration testing at a target table level. Metadata validation was also performed like the data type of all columns and number of columns.
- Aligning with customer’s Data Science Teams
- Design an MLOps architecture based on best practices of Databricks and Azure
- Implementation of a framework based MLOps Solution with reusable code templates
- Improved experience through on time delivery of insights to the sales team
- Better opportunity identification and planning for improved business performance
- Eliminated manual effort of monitoring and reporting the scoring process for business through automated alerts