Skip to content

Parses web-page to get zip, parses inside xml, reformats it, outputs csv

Notifications You must be signed in to change notification settings

J-Bakuli/WebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WebScraper

The WebScraper does the following:

  1. parses the site to find the latest dataset version http://data.nalog.ru/opendata/7707329152-rafp/
  2. extracts the required parameters from the downloaded xml file
  3. provides the output in the form of 3 columns' csv:
  • The country of headquarters location;
  • A branch office status;
  • #such companies.

About

Parses web-page to get zip, parses inside xml, reformats it, outputs csv

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages