mirror of https://gitlab.com/moepoi/journalscrapper.git synced 2024-12-22 14:35:01 +01:00

No description

Find a file

Moe 52d0b07b63 Fix Typo		2022-08-22 10:53:46 +07:00
scrapper	Fix Typo	2022-08-22 10:53:46 +07:00
.gitignore	update .gitignore	2022-08-22 10:52:25 +07:00
Dockerfile	Initial commit	2022-08-18 10:33:29 +07:00
main.py	Initial commit	2022-08-18 10:33:29 +07:00
README.md	Initial commit	2022-08-18 10:33:29 +07:00
requirements.txt	Initial commit	2022-08-18 10:33:29 +07:00

README.md

Journal Scrapper

Installation

Docker

$ docker run --name JournalScrapper -d -p 5000:5000 registry.gitlab.com/moepoi/journalscrapper:latest

Manual

$ mkdir JournalScrapper && cd JournalScrapper && git clone https://gitlab.com/moepoi/journalscrapper.git .
$ pip3 install -r requirements.txt
$ python3 main.py

Usage

Get Users

$ curl localhost:5000/getusers?name=viny

[
  {
      "name": "VINY CHRISTANTI MAWARDI",
      "id": "5990793",
      "type": "Teknik Informatika (S1)",
      "image": "https://scholar.google.co.id/citations?view_op=view_photo&user=hayqUI0AAAAJ&citpid=1"
  }
]

Get User

$ curl localhost:5000/getuser?id=5990793

{
  "name": "VINY CHRISTANTI MAWARDI",
  "id": "5990793",
  "type": "S1 - Teknik Informatika",
  "image": "https://scholar.google.co.id/citations?view_op=view_photo&user=hayqUI0AAAAJ&citpid=1",
  "gscholar_id": "hayqUI0AAAAJ",
  "affiliation": "Universitas Tarumanagara",
  "subject": [
      "Information Retrieval"
  ],
  "sinta_score_overall": "438",
  "sinta_score_3yrs": "94",
  "affil_score": "0",
  "affil_score_3yrs": "0",
  "summary": {
      "article": {
      "scopus": "7",
      "gscholar": "160",
      "wos": "0"
      },
      "citation": {
      "scopus": "22",
      "gscholar": "116",
      "wos": "0"
      },
      "cited_document": {
      "scopus": "5",
      "gscholar": "33",
      "wos": "0"
      },
      "h_index": {
      "scopus": "3",
      "gscholar": "6",
      "wos": ""
      },
      "i10_index": {
      "scopus": "1",
      "gscholar": "3",
      "wos": ""
      },
      "g_index": {
      "scopus": "1",
      "gscholar": "1",
      "wos": ""
      }
  }
}

Get Citations

$ curl localhost:5000/getcitations?id=hayqUI0AAAAJ

[
  {
      "title": "Fast and accurate spelling correction using trie and Damerau-levenshtein distance bigram",
      "id": "hayqUI0AAAAJ:TFP_iSt0sucC",
      "author": "VM Christanti, DS Naga",
      "journal": "Telkomnika 16 (2), 827-833",
      "year": "2018"
  },
  {
      "title": "Automatic essay scoring in E-learning system using LSA method with N-gram feature for Bahasa Indonesia",
      "id": "hayqUI0AAAAJ:k_IJM867U9cC",
      "author": "RS Citawan, VC Mawardi, B Mulyawan",
      "journal": "MATEC web of conferences 164, 01037",
      "year": "2018"
  },
  {
      "title": "Content-based image retrieval using convolutional neural networks",
      "id": "hayqUI0AAAAJ:SeFeTyx0c_EC",
      "author": "Z Rian, V Christanti, J Hendryli",
      "journal": "2019 IEEE International Conference on Signals and Systems (ICSigSys), 1-7",
      "year": "2019"
  }
  //...........
]

Get Citation

$ curl localhost:5000/getcitation?id=hayqUI0AAAAJ:hkOj_22Ku90C

{
  "title": "Aplikasi Clustering Berita dengan Metode K Means dan Peringkas Berita dengan Metode Maximum Marginal Relevance",
  "url": "https://journal.untar.ac.id/index.php/jiksi/article/view/11560",
  "info": {
      "authors": "Edy Susanto, Viny Christanti Mawardi, Manatap Dolok Lauro",
      "publication date": "2021",
      "journal": "Jurnal Ilmu Komputer dan Sistem Informasi",
      "volume": "9",
      "issue": "1",
      "pages": "62-68",
      "description": "News is information about facts or opinions that are interesting to know. News can be obtained from various media such as newspapers and the internet. As is well known, news has various topics, such as politics, sports and others. There is also the same story written with the addition of a little information. This causes it to take more time to get the headline of the news. Therefore we need a system for news clustering using the K-Means method and news summarizing using the Maximum Marginal Relevance (MMR) method in order to obtain information from news more easily and efficiently. News that is processed in the form of a collection of files (multi document) with the extension txt. The summarization process goes through the text preprocessing stage, which consists of sentence segmentation, case folding, tokenizing, filtering, stemming. The next step is TF-IDF calculation to calculate word weight then Cosine Similarity to calculate the similarity between documents. After that, enter the K-Means stage for clustering division and proceed with determining the summary with MMR. Based on the results testing that has been done, this application is running well, the results of clustering and summarizing news can make it easier for users to get news summaries from some similar news."
  },
  "download": true,
  "download_link": "http://journal.untar.ac.id/index.php/jiksi/article/download/11560/7233"
}