mirror of
https://github.com/MarshalX/telegram-crawler.git
synced 2025-03-14 04:51:42 +01:00
extend readme
This commit is contained in:
parent
98590aa3f2
commit
3279341ed3
2 changed files with 46 additions and 2 deletions
1
.github/FUNDING.yml
vendored
Normal file
1
.github/FUNDING.yml
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
patreon: MarshalX
|
47
README.md
47
README.md
|
@ -4,9 +4,12 @@ This project is developed to automatically detect changes made
|
|||
to the official Telegram sites. This is necessary for anticipating
|
||||
future updates and other things (new vacancies, API updates).
|
||||
|
||||
**– [](https://github.com/MarshalX/telegram-crawler/actions/workflows/make_files_tree.yml) [Site updates tracker](https://github.com/MarshalX/telegram-crawler/commits/data)**
|
||||
|
||||
**– [](https://github.com/MarshalX/telegram-crawler/actions/workflows/make_tracked_links_list.yml) [Site links tracker](https://github.com/MarshalX/telegram-crawler/commits/main/tracked_links.txt)**
|
||||
| Name | Commits | Status |
|
||||
| -----| -------- | ------ |
|
||||
| Site updates tracker| [Commits](https://github.com/MarshalX/telegram-crawler/commits/data) |  |
|
||||
| Site links tracker | [Commits](https://github.com/MarshalX/telegram-crawler/commits/main/tracked_links.txt) |  |
|
||||
|
||||
|
||||
### How it should work in dreams
|
||||
|
||||
|
@ -48,6 +51,46 @@ future updates and other things (new vacancies, API updates).
|
|||
- bug fixes;
|
||||
- alert system.
|
||||
|
||||
### Example of link crawler rules configuration
|
||||
|
||||
```python
|
||||
CRAWL_RULES = {
|
||||
# every rule is regex
|
||||
# empty string means match any url
|
||||
# allow rules with high priority than deny
|
||||
'translations.telegram.org': {
|
||||
'allow': {
|
||||
r'^[^/]*$', # root
|
||||
r'org/[^/]*/$', # 1 lvl sub
|
||||
r'/en/[a-z_]+/$' # 1 lvl after /en/
|
||||
},
|
||||
'deny': {
|
||||
'', # all
|
||||
}
|
||||
},
|
||||
'bugs.telegram.org': {
|
||||
'deny': {
|
||||
'', # deny all sub domain
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Current hidden urls list
|
||||
|
||||
```python
|
||||
HIDDEN_URLS = {
|
||||
# 'corefork.telegram.org', # disabled
|
||||
|
||||
'telegram.org/privacy/gmailbot',
|
||||
'telegram.org/tos',
|
||||
'telegram.org/tour',
|
||||
'telegram.org/evolution',
|
||||
|
||||
'desktop.telegram.org/changelog',
|
||||
}
|
||||
```
|
||||
|
||||
### License
|
||||
|
||||
Licensed under the [MIT License](LICENSE).
|
Loading…
Add table
Reference in a new issue