Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TODO] adapt to run in google collab with google sheets as datasource #14

Open
sumanashrestha opened this issue Apr 20, 2024 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@sumanashrestha
Copy link
Collaborator

The script runs in collab but requires manually uploading files to collab vm. Adapt the script such that it takes input from google sheet and outputs to sheet

The goal is to make this easy to run for non technical folks directly from a browser without having to setup anything

@sumanashrestha sumanashrestha added the help wanted Extra attention is needed label Apr 20, 2024
@bsanjok
Copy link

bsanjok commented Apr 20, 2024

I have created a google colab document based on the current code. Google codelab doesn't support GUI based file selection from the google drive. Hence, files cannot be selected unless url is pasted manually. That is more confusing. Hence I implemented local file upload support. A copy of timestamped output is saved in google drive and downloaded locally at the end:
Google Colab

Removed

  • Removed argparse

Implemented

  • google drive authentication for drive directory access
  • local tsv file selection prompts
  • google drive timestamped folder creation and output upload
  • local download of final output

Need to Implement

  • tsv file validation
  • graceful handling of keyboard interruptions

@sumanashrestha
Copy link
Collaborator Author

sumanashrestha commented Apr 21, 2024

This looks promising. Would like to retain the CLI interface as well. Is there a way to share code between the CLI interface and the collab one, it seems we are copying code to ipynb. we could refactor to separate core logic from CLI stuff to separate files. Is is possible to source collab code block directly from github raw url ?

We could have a single google spreadsheet file with three sheets - schools, centers, prefs. This would be mean the entire workflow is browser based. Having to copy a single url to a variable to achieve this may be an easy enough solution

#19

@amitness
Copy link

@sumanashrestha It's possible to share the code between the two. Instead of copying the code to the notebook, you can simply clone the github repo on the colab notebook and run the cli directly.

I have implemented a version, where the data is present in the Google sheet here and the notebook automatically downloads the tsv files from the sheet and runs the code from this repo. If the google sheet is modified, you just run the code again.

Here is the colab notebook

You just have to make sure that for the sheet you use, you turn on sharing to "Anyone with the link" can view.
image

This doesn't mean the sheet is public to everyone. Only those with the link can view it. As such, you can keep a copy of the sheet privately in the ministry's account if that's the intention. Alternatively, you can also make the sheet public, but only give edit access to authorized personnel in the ministry. With that, people can suggest changes via comments, but only authorized personnel can make final changes.

@thesushilsharma
Copy link

Use gspread without API Authentication

You can also use gspread without API authentication if you're working with your own Google account and do not need to access private spreadsheets.

Pre-requisites

  • Save files in Google Drive (Python library will search the entire Google Drive for sheet names)
  • file type: Google Sheets
    image

Code Considerations

  • Configure Google Sheet Access

import gspread
from google.auth import default 
def authorize_google_sheets():
    """Authorize and return the gspread client."""
    #Authenticate google sheet
    
    creds, _ = default()
    return gspread.authorize(creds)
  • Reading School and Center data

def read_google_sheet(sheet_name: str) -> List[Dict[str, str]]:
    """Read data from a Google Sheet."""
    gc = authorize_google_sheets()
    sheet = gc.open(sheet_name).sheet1
    return sheet.get_all_records()
  • Reading Preference data

def read_google_sheet_prefs(sheet_name: str) -> Dict[str, Dict[str, int]]:
    """Read preference data from a Google Sheet."""
    prefs = {}
    gc = authorize_google_sheets()
    sheet = gc.open(sheet_name).sheet1
    data = sheet.get_all_records()

    for row in data:
        scode = row.get('scode', '')
        cscode = row.get('cscode', '')
        pref = int(row.get('pref', 0))
        if scode:
            prefs.setdefault(scode, {}).setdefault(cscode, 0)
            prefs[scode][cscode] += pref
    
    return prefs
  • Other Code Sections

schools = sorted(read_google_sheet('schools_grade12_2081'), key= school_sort_key)
centers = read_google_sheet('centers_grade12_2081')
prefs = read_google_sheet_prefs('prefs')

We can offer the option to customize sheet names dynamically by either prompting the user or utilizing environment variables.
When using 'gspread' with API authentication, you will need to create a service account and obtain a JSON key file from the Google Cloud Console. This may be more complicated for non-techies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants