mirror of
https://github.com/MarshalX/telegram-crawler.git
synced 2025-03-15 13:22:43 +01:00
update readme
This commit is contained in:
parent
4bda55cfb9
commit
e217c3f5ec
2 changed files with 28 additions and 20 deletions
4
.github/workflows/make_files_tree.yml
vendored
4
.github/workflows/make_files_tree.yml
vendored
|
@ -1,4 +1,4 @@
|
|||
name: Fetch new content of tracked links to files
|
||||
name: Fetch new content of tracked links and files
|
||||
|
||||
on:
|
||||
schedule:
|
||||
|
@ -9,7 +9,7 @@ on:
|
|||
- main
|
||||
|
||||
jobs:
|
||||
make_tracked_links_file:
|
||||
fetch_new_content:
|
||||
name: Make files tree
|
||||
runs-on: macos-10.15
|
||||
|
||||
|
|
44
README.md
44
README.md
|
@ -1,22 +1,22 @@
|
|||
## 🕷 Telegram Web Crawler
|
||||
## 🕷 Telegram Crawler
|
||||
|
||||
This project is developed to automatically detect changes made
|
||||
to the official Telegram sites. This is necessary for anticipating
|
||||
future updates and other things (new vacancies, API updates, etc).
|
||||
to the official Telegram sites and beta clients. This is necessary for
|
||||
anticipating future updates and other things
|
||||
(new vacancies, API updates, etc).
|
||||
|
||||
|
||||
| Name | Commits | Status |
|
||||
| -----| -------- | ------ |
|
||||
| Site updates tracker| [Commits](https://github.com/MarshalX/telegram-crawler/commits/data) |  |
|
||||
| Site links tracker | [Commits](https://github.com/MarshalX/telegram-crawler/commits/main/tracked_links.txt) |  |
|
||||
| Name | Commits | Status |
|
||||
|----------------------| -------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| Data tracker | [Commits](https://github.com/MarshalX/telegram-crawler/commits/data) |  |
|
||||
| Site links collector | [Commits](https://github.com/MarshalX/telegram-crawler/commits/main/tracked_links.txt) |  |
|
||||
|
||||
* ✅ passing – new changes
|
||||
* ❌ failing – no changes
|
||||
|
||||
You should to subscribe to **[channel with alerts](https://t.me/tgcrawl)** to stay updated.
|
||||
Copy of Telegram websites stored **[here](https://github.com/MarshalX/telegram-crawler/tree/data/data)**.
|
||||
Copy of Telegram websites and client`s resources stored **[here](https://github.com/MarshalX/telegram-crawler/tree/data/data)**.
|
||||
|
||||

|
||||

|
||||
|
||||
### How it works
|
||||
|
||||
|
@ -32,16 +32,18 @@ Copy of Telegram websites stored **[here](https://github.com/MarshalX/telegram-c
|
|||
2. [Content crawling](make_files_tree.py) is launched **as often as
|
||||
possible** and uses the existing list of links collected in step 1.
|
||||
Going through the base it gets contains and builds a system of subfolders
|
||||
and files. Removes all dynamic content from files.
|
||||
|
||||
and files. Removes all dynamic content from files. It downloads beta version
|
||||
of Android Client, decompiles it and track resources also. Tracking of
|
||||
resources of Telegram for macOS presented too.
|
||||
|
||||
3. Using of [GitHub Actions](.github/workflows/). Works without own servers.
|
||||
You can just fork this repository and own tracker system by yourself.
|
||||
Workflows launch scripts and commit changes. All file changes are tracked
|
||||
by the GIT and beautifully displayed on the GitHub. GitHub Actions
|
||||
should be built correctly only if there are changes on the Telegram website.
|
||||
Otherwise, the workflow should fail. If build was successful, we can
|
||||
send notifications to Telegram channel and so on.
|
||||
|
||||
by GIT and beautifully displayed on GitHub. GitHub Actions should be built
|
||||
correctly only if there are changes on the Telegram website. Otherwise, the
|
||||
workflow should fail. If build was successful, we can send notifications to
|
||||
Telegram channel and so on.
|
||||
|
||||
### FAQ
|
||||
|
||||
**Q:** How often is "**as often as possible**"?
|
||||
|
@ -70,9 +72,15 @@ after push to repository. Therefore, script are waiting for information to appea
|
|||
|
||||
**A:** To increase limits of GitHub API.
|
||||
|
||||
**Q:** Why are you decompiling .apk file each run?
|
||||
|
||||
**A:** Because it doesn't require much time. I am decompiling only
|
||||
resources (-s flag of apktool to disable disassembly of dex files).
|
||||
Writing a check for the need for decompilation by the hash of the apk file
|
||||
would take more time.
|
||||
|
||||
### TODO list
|
||||
|
||||
- add storing history of content using hashes;
|
||||
- add storing hashes of image, svg, video.
|
||||
|
||||
### Example of link crawler rules configuration
|
||||
|
|
Loading…
Add table
Reference in a new issue