This project uses the free tier of GitHub Actions to run a webscraper. The YML files controling the scraping are located under .github/workflows. The parsed content is saved on rss_feeds_combined.txt and nyt.txt.
One of the YML files is pasted below as an example.
name: Use the standard Python library to parse RSS Feeds on: workflow_dispatch: schedule: – cron: ‘7 */12 * * *’ jobs: scheduled: runs-on: ubuntu-latest steps: – name: Check out this repo uses: actions/checkout@v4 with: fetch-depth: 0 – name: Commit and push if it changed run: | wget -O npr.xml https://feeds.npr.org/1001/rss.xml wget -O arstechnica.xml https://feeds.arstechnica.com/arstechnica/index wget -O wgrznews.xml https://www.wgrz.com/feeds/syndication/rss/news/local python3 pythonstdlibraryrss.py git config user.name “Automated” git config user.email “[email protected]” git add -A timestamp=$(date -u) git commit -m “Latest data: ${timestamp}” || exit 0 git push
Source