WANdisco LiveData Migrator closes last mile gap to move Hadoop data and Hive metadata directly into Delta Lake on Databricks, enabling faster adoption of AI/ML
San Ramon, CA /PRNewswire/ - WANdisco, the LiveData company, announced today that its LiveData Migrator platform, which automates the migration and replication of Hadoop data from on-premises to the cloud, can now automate the migration of Apache Hive metadata directly into Databricks to help users save time, reduce costs, and more quickly enable new AI and machine learning capabilities. For the first time, enterprises that want to migrate on-premises Hadoop and Spark content from Hive to Databricks can do so at scale and with high efficiency, while mitigating the many risks associated with large-scale cloud migrations.
- Data sets do not need to be migrated in full before they are converted into the Delta format. LiveData Migrator automates incremental transformation to Delta Lake.
- Accelerate time to business insights by eliminating the need for manual data mappings with direct, native access to structured data in Databricks from on-premises environments.
- Use a single pane of glass to manage both Hadoop data and Hive metadata migrations.
Ongoing changes to source metadata are reflected immediately in Databricks' Lakehouse platform, and on-premises data formats used in Hadoop and Hive are automatically made available in Delta Lake on Databricks. By combining data and metadata and making on-premises content immediately usable in Databricks, users can eliminate migration tasks that previously required constructing data pipelines to transform, filter and adjust data - along with the significant up-front planning and staging. Work that would otherwise be required for setting up auto-load pipelines to identify newly-landed data, and convert it to a final form as part of a processing pipeline are set aside.
"This new feature brings together the power of Databricks and WANdisco LiveData Migrator," said WANdisco CTO Paul Scott-Murphy. "Data and metadata are migrated automatically without any disruption or change to existing systems. Teams can implement their cloud modernization strategies without risk, immediately employing workloads and data that were locked up on-premises, now in the cloud using the lakehouse platform offered by Databricks."
"Enterprises want to break silos and bring all their data into a lakehouse for analytics and AI but they have been constrained by their on-premises infrastructure," said Pankaj Dugar, Vice President of Product Partnerships at Databricks. "With the new Hive metadata capabilities in WANdisco's LiveData Migrator, it will now be much easier to take advantage of Databricks' Lakehouse Platform."
LiveData Migrator automates cloud data migration at any scale by enabling companies to easily migrate data from on-premises Hadoop-oriented data lakes to any cloud within minutes, even while the source data sets are under active change. Businesses can migrate their data without the expertise of engineers or other consultants to enable their digital transformation. LiveData Migrator works without any production system downtime or business disruption while ensuring the migration is complete and continuous and any ongoing data changes are replicated to the target cloud environment.
Making Hive data and metadata available for direct use in Delta Lake in Databricks can be achieved by configuring LiveData Migrator to have a data migration target available for the chosen cloud storage and Databricks. Users choose to convert content to the Delta Lake format when they create the Databricks metadata target. The desired data to migrate is then set by defining a migration rule, and selecting the Hive databases and tables that require migration.
Learn more about successful strategies for Hadoop to Cloud migration at the Databricks Data+AI Summit 2021, with sessions including accelerating analytics on Databricks (Wed., May 26, 4:25 p.m. PT) and Spark-based analytics by minimizing barriers of Hadoop migration (Thurs., May 27,11:35 a.m. PT) presented by WANdisco, Databricks, Microsoft and Avanade.
WANdisco is the LiveData company. WANdisco solutions enable enterprises to create an environment where data is always available, accurate and protected, creating a strong backbone for their IT infrastructure and a bedrock for running consistent, accurate machine learning applications. With zero downtime and zero data loss, WANdisco LiveData Platform keeps geographically dispersed data at any scale consistent between on-premises and cloud environments allowing businesses to operate seamlessly in a hybrid or multi-cloud environment. For more information on WANdisco, visit www.wandisco.com.