The Linux Foundation today added a new project, called DataPractices.org, which acts as a template for data best practices. The project will offer open coursework for data teamwork in an effort to create a vendor-neutral community to establish these practices and increase data knowledge.
The project was initially created by data.world, a data catalog platform for data and analysis, as a data practices manifesto. The manifesto contains the values and principles that create an effective, modern, and ethical approach to data teamwork. According to Brett Hurt, data.world co-founder and CEO, the main goal of the project is to “raise the level of data literacy across the ecosystem.”
Data teamwork, said Hurt, is a method for bringing together “your data practitioners, subject matter experts, and other stakeholders by removing costly barriers to data discovery, comprehension, integration, and sharing.” He added that this method enables companies to “achieve anything with data, faster.”
Under the Linux Foundation, DataPractices.org will continue and further the work started by data.world’s manifesto. The manifesto is up on the Linux Foundation’s website (and available to sign) and contains a number of values and principles.
The four values listed are: inclusion, experimentation, accountability, and impact. And there are 12 principles listed, many of which revolve around how to use data without bias, considering ethical implications, open and diverse practices around both using and implementing data, and principles around privacy and security.
As of press time 1,728 individuals had signed the manifesto.
In addition, as a Linux Foundation project, DataProjects.org will offer coursework that is open to anyone interested and meant to improve data literacy. It contains curriculum on project life cycle, including how to start a data project, where to source data, and how to use it. It also includes a number of topics around data science, visualizations, and data ethics.
The project says it will rely on expert practitioners to refine and advance this coursework going forward. On the courseware page, it says the project “welcomes any contributions, refinements, or additions to this body of work.” Those interested in contributing are sent to contact the project directly or submit changes to its Github repository.
Hurt noted that the idea of creating a standard or initiative around building data best practices comes from the software development world. “The evolution there from disorganization, to the Waterfall Development model, and now the Agile movement has tackled many of the same challenges that the data ecosystem is facing today. Our hope is that by starting our own movement with a similar potential for impact as we saw with Agile, we’re able to accelerate the time to general proficiency and create a common baseline for all data practitioners,” he said.
The Waterfell Development model and Agile movement that he refers to are development methodologies for software and engineering design. The ‘Agile for Data’ model is something that data.world has adopted to apply some of these initial software methodologies and models to data.
Hurt said that the Linux Foundation is the right place for it to expand this movement and bring it to the broader industry that the foundation reaches. “For data teamwork to truly evolve and grow, data literacy needs to happen across all projects, teams, and organizations,” he said. “This is a cultural effort and will improve both open and closed data projects because it is focused on the individuals, rather than the processes.”