Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the issue of using CVAT API to backup task return value 202 #7878

Open
2 tasks done
youth123-c opened this issue May 11, 2024 · 4 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@youth123-c
Copy link

youth123-c commented May 11, 2024

Actions before raising this issue

  • I searched the existing issues and did not find anything similar.
  • I read/searched the docs

Steps to Reproduce

  1. Call /api/tasks?page to get task_id into a list
  2. Pass the task_id in the list to the backup API in turn
def backup_task(
    token: str,
    task_id: int,
    account_name: str,
    container_name: str,
    file_format: str,
    request_url: str,
):
    """Get all task ids from CVAT.
    Args:
        token: The token obtained after login.
    """
    try:
        headers = {"Authorization": f"Token {token}"}
        page = 1

        if task_id == -1:
            task_ids = []
            download_all = True
        else:
            task_ids = [task_id]
            download_all = False

        while download_all:
            try:
                response = requests.get(
                    url=request_url + f"/api/tasks?page={page}",
                    headers=headers,
                    timeout=10,
                )
                response.raise_for_status()
            except requests.exceptions.HTTPError as err:
                logging.error("HTTP Error: %s", err)
                break
            except requests.exceptions.ConnectionError as err:
                logging.error("Error Connecting: %s", err)
                break
            except requests.exceptions.Timeout as err:
                logging.error("Timeout Error: %s", err)
                break
            except requests.exceptions.RequestException as err:
                logging.error("Oops: Something Else: %s", err)
                break
            tasks = response.json()
            if not tasks["results"]:
                break
            task_ids.extend([task["id"] for task in tasks["results"]])
            page += 1

        if task_ids:
            logging.info("Found %s tasks need to backup.", len(task_ids))
            failed_tasks = []
            for task_id in tqdm(
                task_ids, desc="Downloading tasks", unit="task", mininterval=5
            ):
                try:
                    success = download_annotation(
                        token,
                        task_id,
                        account_name,
                        container_name,
                        file_format,
                        request_url,
                    )
                except RetryError:
                    success = False
                if not success:
                    failed_tasks.append(task_id)
            if failed_tasks:
                logging.error("Failed to download tasks: %s", failed_tasks)
                return False
            else:
                logging.info("All tasks have been backed up.")
                return True
    except Exception as e:
        logging.error("An error occurred: %s", e)
    return False

@retry(
    retry_on_result=retry_if_status_code_not_ok,
    stop_max_attempt_number=5,
    wait_fixed=8000,
)
def download_annotation(
    download_token: str,
    download_task_id: int,
    account_name: str,
    container_name: str,
    file_format: str,
    request_url: str,
):
    """Download annotation.
    Args:
        download_token: Login token.
        download_task_id: Task id.
        account_name: Azure Blob Storage account name.
        container_name: Azure Blob Storage container name.
        file_format: Type of annotation file.
        request_url: CVAT request url.
    """
    task_url = request_url + "/api/tasks/" + str(download_task_id) + "/annotations"
    task_header = {"Authorization": "Token " + download_token}
    # Get the current date and time
    now = datetime.now()
    # Format the date and time
    dt_string = now.strftime("%y%m%d%H%M%S")
    # Create the filename
    filename = f"{download_task_id}_{dt_string}.zip"
    params = {"action": "download", "format": file_format, "filename": filename}
    response = requests.get(
        url=task_url, params=params, headers=task_header, verify=False, timeout=10
    )
    if response.status_code == 200:
      return True
    else:
      return False

Expected Behavior

When I have a large task_id list, for example, the length is 300, 30 task_id backups will fail, and the backup interface return value is 202.

Possible Solution

I added retry to the backup interface and tried 5 times with an interval of 8 seconds each time. When I ran the script recently, I found that the same batch of task_ids failed each time.

Context

I want to solve this 202 return value problem and back up all the data when I do batch backup.

Environment

python 3.9
cvat 2.1.0
@youth123-c youth123-c added the bug Something isn't working label May 11, 2024
@PushpakBhoge
Copy link

@youth123-c the download apis are normally async 202 means task submitted. it is actually a success response

you will find the backup file on some other API in UI the javascrip generally take care of this, I don't know which is the second API in case of CVAT but I based of network observation you have to keep hitting same request until you get 201 with param action='download'.

even if you put your request in while with this param I don't thik it will work
I will recommend using the cvat sdk pip install cvat-sdk==2.1.0
https://docs.cvat.ai/docs/api_sdk/sdk/highlevel-api/

here one example

from cvat_sdk import make_client, models
with make_client(host=CVAT_HOST, credentials=(CVAT_USER, CVAT_PASS)) as client:
    task_meta = client.tasks.retrieve(download_task_id)
    task_meta.export_dataset(format_name="CVAT for images 1.1", filename=local_path)

@youth123-c
Copy link
Author

Thank you very much for your help, I will try it using cvat sdk

@bsekachev
Copy link
Member

Did you sort out how to work with backup API?

@youth123-c
Copy link
Author

 def download_annotation(
        self,
        download_task_id,
        account_name,
        container_name,
        file_format,
    ):
        """Download annotation with CVAT SDK.
        Args:
            download_task_id: Task id.
            account_name: Azure Blob Storage account name.
            container_name: Azure Blob Storage container name.
            file_format: Type of annotation file.
            CVAT_HOST: CVAT host.
            CVAT_USER: CVAT username.
            CVAT_PASS: CVAT password.
        """
        try:
            with make_client(
                host=self.request_url, credentials=(self.username, self.password)
            ) as client:
                task_meta = client.tasks.retrieve(download_task_id)
                # Get the current date and time
                now = datetime.now()
                # Format the date and time
                dt_string = now.strftime("%y%m%d%H%M%S")
                local_path = f"{download_task_id}_{dt_string}.zip"
                parsed_url = urlparse(self.request_url)
                hostname = parsed_url.hostname.split(".")[0]
                blob_path = f"cvat_backup/{hostname}/{download_task_id}/{local_path}"
                task_meta.export_dataset(format_name=file_format, filename=local_path)
                return True
        except Exception as e:
            logging.error("Error downloading annotations: %s", str(e))
            return False

I used cavt sdk to download the task, but I found that the export_dataset method downloads image and annotation. If I only want to download annotation, I should use that method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants