Fedora Data Working Group
Fedora Data Working Group
Putting lines on dotsFedora data Working Group (FDWG)
We manage Fedora's data infrastructure: collecting, storing, and making sense of the data that flows through the Fedora Project. Our primary purpose is to provide decision support to CommOps through community health analytics.
👋 New here?
Follow these steps to get oriented:
What is FDWG
Read our contributors guide to understand who we are and what we do.
Find us in Matrix!
If you're new, come join us in #data on Matrix to
get started!
To get a feel for what's happening, our Radar shows (most of) our WIP and On-Deck.
🔑 Key Projects
Hatlas - our data platform and development environment. Hatlas started as a personal dev environment and grew into the shared infrastructure that enables FDWG's data science work. It provides:
- Community tooling for queries, dashboards, and collaborative exploration
- Raw data to enable local processing with preferred tools
- The infrastructure for all data engineering (move, clean, augment, optimize)
Datanommer: Fedora's message bus archiver. Ingests messages and stores them as Parquet files for analysis. Related repos:
- datanommer-parquet-download: download Datanommer Parquet data
- datanommer-mini: lightweight local instance
- datanommer-docker: Docker setup
- datanommer-dictionary: message schema knowledge capture
docs (WIP / TODO): Public-facing FDWG documentation published to docs.fedoraproject.org.
📬 Contact & Communication
- Private comms: Use our FAS emails for anything that shouldn't be public.