No description
Find a file
2024-12-15 09:01:46 -08:00
main.py Initial commit 2024-12-15 09:01:46 -08:00
README Initial commit 2024-12-15 09:01:46 -08:00
requirements.txt Initial commit 2024-12-15 09:01:46 -08:00

# README

**Description**:  
This script (`main.py`) fetches data from the Fedora DataGrepper API over a specified date range and saves the results as Parquet files. By default, it fetches data for the last 10 days.

The script uses concurrency to speed up data fetching and handles potential request failures by retrying. It also cleans the data by removing non-UTF8 characters before saving it as compressed Parquet files.

At the bottom of the `main.py` file, there is an example usage showing how to run the script with the default settings (a 10-day range).

**Running the script**:  
1. Install the required packages:  
   `pip install -r requirements.txt`

2. Run the script directly:  
   `python main.py`

This command fetches data from the past 10 days by default and stores it in `output/` by default where the script is ran.