Skip to content

Showcase visualizations about Osaka Average Hotel Price. The data was collected from Booking.com

Notifications You must be signed in to change notification settings

sakan811/Find-Osaka-Average-Hotel-Price

Repository files navigation

Find the Hotel's Average Room Price in Osaka

Showcase visualizations about the Hotel's Average Room Price in Osaka.

Status

CodeQL
Scraper Test
Scrape

Visualizations

Power BI

Data as of May 19, 2024
Instagram
Facebook

Project Details

Collect Osaka hotel property data from Booking.com

Data collecting start date: May 16th, 2024.

Data was collected weekly using GitHub action with automated_scraper.py

This script can also be used to scrape data from other cities.

Code Base Details

To scrape hotel data

  • Go to set_details.py
  • Set the parameters of the 'Details' dataclass as needed.
    • Example:
    # Set booking details.
    city: str = 'Osaka'
    
    # Check-in and Check-out are used only when using the Basic Scraper
    check_in: str = '2024-12-01'
    check_out: str = '2024-12-12'
    
    group_adults: int = 1
    num_rooms: int = 1
    group_children: int = 0
    selected_currency: str = 'USD'
    
    # Optional
    # Set the start date and number of nights when using Thread Pool Scraper or Month End Scraper
    start_day: int = 1
    month: int = 12
    year: int = 2024
    nights: int = 1
    
    # Set SQLite database name
    sqlite_name: str = 'test.db'
    
  • To scrape using Thread Pool Scraper:
    • Run the following command via command line terminal:
      python main.py --thread_pool=True
      
    • Scrape data start from the given start date to the end of the same month.
      • Scrape five dates at the same time.
  • To scrape using Month End Scraper:
    • Run the following command via command line terminal:
      python main.py --month_end=True
      
    • Scrape data start from the given start date to the end of the same month.
  • To scrape using Basic Scraper:
    • Run the following command via command line terminal:
      python main.py 
      
    • Scrape data based on the given check-in and check-out date.
  • Data is saved to CSV by default.
    • Add --to_sqlite=True to save data to SQLite database.
    python main.py --to_sqlite=True
    
  • Month to scrape can be specified using --month=(month number as int) for Thread Pool and Month End Scraper.
    • For example, to scrape data from June of the current year using Thread Pool Scraper, run the following command line:
    python main.py --thread_pool=True --month=6
    

Dataclass

set_details.py

  • Dataclass that stores booking details, date, and length of stay.
    • Provide which kind of hotel data to scrape.

migrate_to_sqlite.py

  • Migrate data to SQLite table using sqlite3 module.
    • Create SQLite database named 'avg_japan_hotel_price.db'
  • Create View using sqlite3 module.

scrape.py

  • Scrape data from Booking.com website.

scrape_until_month_end.py

  • Scrape data for each date.
    • Start from the given start date until the end of the same month.

thread_scrape.py

  • Scrape data for five dates at the same time using Thread Pool Execute.
    • Start from the given start date until the end of the same month.

utils.py

  • Contain utility functions.

Automated Hotel Scraper

automated_scraper.py

  • Scrape Osaka hotel data daily using GitHub action for all 12 months.
    • Save to CSV for each month.
  • Save CSV to Google Cloud Storage.