The Linux Foundation’s Open Data Platform initiative (ODPi) today unveiled its Egeria project as a way to standardize the flow of metadata between different big data technologies and vendor platforms. Egeria advances the main goal of ODPi, which is to ensure the smooth working of Apache Hadoop across different big data solutions.
Egeria provides organizations with a simplified view to locate, manage, and use vast data resources more efficiently.
The project creates a set of open APIs, types, and interchange protocols that allow metadata repositories to share and exchange information. It then adds governance, discovery, and access frameworks to automate the collection, management, and use of metadata across an organization. The initial Egeria release also supports data privacy regulations tied to GDPR.
Metadata is a term for all data. For an organization this would be all of its data assets in all of their forms regardless of where that data is located.
The Egeria project matured out of the Apache Atlas project that was designed as an open source metadata repository for the Apache Hadoop ecosystem. IBM and ING donated code to that project to form Egeria.
“Changing the availability and the quality of metadata will in turn improve the agility of the data scientist, as well as the transparency of the results they produce,” said Mandy Chessell, distinguished engineer and master inventor at IBM, in a statement. “Egeria simplifies metadata capture and management to create a consistent view of data across all tools an organization may use.”
Hadoop has become a common platform to handle big data management. However, it lacks the tools to allow for more IT-centric control. Sean Suchter, CTO at Pepperdata, told SDxCentral last year that “[Hadoop] was not really developed with traditional IT in mind.” Suchter was part of the team at Yahoo that worked on creating Hadoop as a way to handle the company’s search efforts.
ODPi itself initially started from work done by vendors in the Hadoop ecosystem. It was then moved into the Linux Foundation in late 2015. Initial members included Hortonworks, IBM, Pivotal, and VMware.
Those efforts were somewhat controversial as some members of that Hadoop ecosystem questioned the need for a vendor-driven initiative.