run link crawler as often as possible; update readme; fix workflows

2025-03-15 13:22:43 +01:00 · 2021-04-26 22:27:57 +02:00 · 2021-04-26 22:27:57 +02:00 · 0afa3e61e6
commit 0afa3e61e6
parent f3efcdafac
5 changed files with 29 additions and 10 deletions
--- a/.github/workflows/make_and_send_alert.yml
+++ b/.github/workflows/make_and_send_alert.yml
@ -24,6 +24,7 @@ jobs:
        if: ${{ github.event.head_commit.author.name == 'GitHub Action' }}
        env:
          COMMIT_SHA: ${{ github.sha }}
+          GITHUB_PAT: ${{ secrets.PAT_FOR_ALERTS }}
          TELEGRAM_BOT_TOKEN: ${{ secrets.TELEGRAM_BOT_TOKEN }}
        run: |
          pip install -r requirements.txt
--- a/.github/workflows/make_files_tree.yml
+++ b/.github/workflows/make_files_tree.yml
@ -2,8 +2,9 @@ name: Fetch new content of tracked links to files

 on:
  schedule:
-    - cron: '* * * * * '
+    - cron: '* * * * *'
  push:
+    # trigger on updated linkbase
    branches:
      - main

--- a/.github/workflows/make_tracked_links_list.yml
+++ b/.github/workflows/make_tracked_links_list.yml
@ -2,10 +2,7 @@ name: Generate or update list of tracked links

 on:
  schedule:
-    - cron: '0 * * * *'
-  push:
-    branches:
-      - main
+    - cron: '* * * * *'

 jobs:
  make_tracked_links_file:
--- a/README.md
+++ b/README.md
@ -20,7 +20,7 @@ Copy of Telegram websites stored **[here](https://github.com/MarshalX/telegram-c

 ### How it works

-1. [Link crawling](make_tracked_links_list.py) runs once an hour. 
+1. [Link crawling](make_tracked_links_list.py) runs **as often as possible**. 
   Starts crawling from the home page of the site. 
   Detects relative and absolute sub links and recursively repeats the operation. 
   Writes a list of unique links for future content comparison. 
@ -44,12 +44,32 @@ Copy of Telegram websites stored **[here](https://github.com/MarshalX/telegram-c
   
 ### FAQ

-**Q:** How many is "**as often as possible**"?
+**Q:** How often is "**as often as possible**"?

 **A:** TLTR: content update action runs every ~10 minutes. More info:
 - [Scheduled actions cannot be run more than once every 5 minutes.](https://github.blog/changelog/2019-11-01-github-actions-scheduled-jobs-maximum-frequency-is-changing/)
 - [GitHub Actions workflow not triggering at scheduled time](https://upptime.js.org/blog/2021/01/22/github-actions-schedule-not-working/).
-  
+
+**Q:** Why there is 2 separated crawl scripts instead of one?
+
+**A:** Because the previous idea was to update tracked links once at hour.
+It was so comfortably to use separated scripts and workflows.
+After Telegram 7.7 update, I realised that find new blog posts so slowly is bad idea.
+
+**Q:** Why alert for sending alerts have while loop?
+
+**A:** Because GitHub API doesn't return information about commit immediately 
+after push to repository. Therefore, script are waiting for information to appear...
+
+**Q:** Why are you using GitHab Personal Access Token in action/checkout workflow`s step?
+
+**A:** To have ability to trigger other workflows by on push trigger. More info:
+- [Action does not trigger another on push tag action ](https://github.community/t/action-does-not-trigger-another-on-push-tag-action/17148)
+
+**Q:** Why are you using GitHab PAT in [make_and_send_alert.py](make_and_send_alert.py)?
+
+**A:** To increase limits of GitHub API.
+
 ### TODO list

 - add storing history of content using hashes;
--- a/make_and_send_alert.py
+++ b/make_and_send_alert.py
@ -7,7 +7,7 @@ import aiohttp
 COMMIT_SHA = os.environ['COMMIT_SHA']

 TELEGRAM_BOT_TOKEN = os.environ['TELEGRAM_BOT_TOKEN']
-GITHUB_PWA = os.environ['GITHUB_PWA']
+GITHUB_PAT = os.environ['GITHUB_PAT']

 REPOSITORY = os.environ.get('REPOSITORY', 'MarshalX/telegram-crawler')
 CHAT_ID = os.environ.get('CHAT_ID', '@tgcrawl')
@ -58,7 +58,7 @@ async def main():
            session=session,
            url=f'{BASE_GITHUB_API}{GITHUB_LAST_COMMITS}'.format(repo=REPOSITORY, sha=COMMIT_SHA),
            headers={
-                'Authorization': f'token {GITHUB_PWA}'
+                'Authorization': f'token {GITHUB_PAT}'
            }
        )