Regarding the issue of using CVAT API to backup task return value 202 #7878

youth123-c · 2024-05-11T08:25:13Z

Actions before raising this issue

I searched the existing issues and did not find anything similar.
I read/searched the docs

Steps to Reproduce

Call /api/tasks?page to get task_id into a list
Pass the task_id in the list to the backup API in turn

def backup_task(
    token: str,
    task_id: int,
    account_name: str,
    container_name: str,
    file_format: str,
    request_url: str,
):
    """Get all task ids from CVAT.
    Args:
        token: The token obtained after login.
    """
    try:
        headers = {"Authorization": f"Token {token}"}
        page = 1

        if task_id == -1:
            task_ids = []
            download_all = True
        else:
            task_ids = [task_id]
            download_all = False

        while download_all:
            try:
                response = requests.get(
                    url=request_url + f"/api/tasks?page={page}",
                    headers=headers,
                    timeout=10,
                )
                response.raise_for_status()
            except requests.exceptions.HTTPError as err:
                logging.error("HTTP Error: %s", err)
                break
            except requests.exceptions.ConnectionError as err:
                logging.error("Error Connecting: %s", err)
                break
            except requests.exceptions.Timeout as err:
                logging.error("Timeout Error: %s", err)
                break
            except requests.exceptions.RequestException as err:
                logging.error("Oops: Something Else: %s", err)
                break
            tasks = response.json()
            if not tasks["results"]:
                break
            task_ids.extend([task["id"] for task in tasks["results"]])
            page += 1

        if task_ids:
            logging.info("Found %s tasks need to backup.", len(task_ids))
            failed_tasks = []
            for task_id in tqdm(
                task_ids, desc="Downloading tasks", unit="task", mininterval=5
            ):
                try:
                    success = download_annotation(
                        token,
                        task_id,
                        account_name,
                        container_name,
                        file_format,
                        request_url,
                    )
                except RetryError:
                    success = False
                if not success:
                    failed_tasks.append(task_id)
            if failed_tasks:
                logging.error("Failed to download tasks: %s", failed_tasks)
                return False
            else:
                logging.info("All tasks have been backed up.")
                return True
    except Exception as e:
        logging.error("An error occurred: %s", e)
    return False

@retry(
    retry_on_result=retry_if_status_code_not_ok,
    stop_max_attempt_number=5,
    wait_fixed=8000,
)
def download_annotation(
    download_token: str,
    download_task_id: int,
    account_name: str,
    container_name: str,
    file_format: str,
    request_url: str,
):
    """Download annotation.
    Args:
        download_token: Login token.
        download_task_id: Task id.
        account_name: Azure Blob Storage account name.
        container_name: Azure Blob Storage container name.
        file_format: Type of annotation file.
        request_url: CVAT request url.
    """
    task_url = request_url + "/api/tasks/" + str(download_task_id) + "/annotations"
    task_header = {"Authorization": "Token " + download_token}
    # Get the current date and time
    now = datetime.now()
    # Format the date and time
    dt_string = now.strftime("%y%m%d%H%M%S")
    # Create the filename
    filename = f"{download_task_id}_{dt_string}.zip"
    params = {"action": "download", "format": file_format, "filename": filename}
    response = requests.get(
        url=task_url, params=params, headers=task_header, verify=False, timeout=10
    )
    if response.status_code == 200:
      return True
    else:
      return False

Expected Behavior

When I have a large task_id list, for example, the length is 300, 30 task_id backups will fail, and the backup interface return value is 202.

Possible Solution

I added retry to the backup interface and tried 5 times with an interval of 8 seconds each time. When I ran the script recently, I found that the same batch of task_ids failed each time.

Context

I want to solve this 202 return value problem and back up all the data when I do batch backup.

Environment

python 3.9
cvat 2.1.0

The text was updated successfully, but these errors were encountered:

PushpakBhoge · 2024-05-11T22:07:29Z

@youth123-c the download apis are normally async 202 means task submitted. it is actually a success response

you will find the backup file on some other API in UI the javascrip generally take care of this, I don't know which is the second API in case of CVAT but I based of network observation you have to keep hitting same request until you get 201 with param action='download'.

even if you put your request in while with this param I don't thik it will work
I will recommend using the cvat sdk pip install cvat-sdk==2.1.0
https://docs.cvat.ai/docs/api_sdk/sdk/highlevel-api/

here one example

from cvat_sdk import make_client, models
with make_client(host=CVAT_HOST, credentials=(CVAT_USER, CVAT_PASS)) as client:
    task_meta = client.tasks.retrieve(download_task_id)
    task_meta.export_dataset(format_name="CVAT for images 1.1", filename=local_path)

youth123-c · 2024-05-13T01:08:02Z

Thank you very much for your help, I will try it using cvat sdk

bsekachev · 2024-05-14T10:38:52Z

Did you sort out how to work with backup API?

youth123-c · 2024-05-14T12:57:24Z

 def download_annotation(
        self,
        download_task_id,
        account_name,
        container_name,
        file_format,
    ):
        """Download annotation with CVAT SDK.
        Args:
            download_task_id: Task id.
            account_name: Azure Blob Storage account name.
            container_name: Azure Blob Storage container name.
            file_format: Type of annotation file.
            CVAT_HOST: CVAT host.
            CVAT_USER: CVAT username.
            CVAT_PASS: CVAT password.
        """
        try:
            with make_client(
                host=self.request_url, credentials=(self.username, self.password)
            ) as client:
                task_meta = client.tasks.retrieve(download_task_id)
                # Get the current date and time
                now = datetime.now()
                # Format the date and time
                dt_string = now.strftime("%y%m%d%H%M%S")
                local_path = f"{download_task_id}_{dt_string}.zip"
                parsed_url = urlparse(self.request_url)
                hostname = parsed_url.hostname.split(".")[0]
                blob_path = f"cvat_backup/{hostname}/{download_task_id}/{local_path}"
                task_meta.export_dataset(format_name=file_format, filename=local_path)
                return True
        except Exception as e:
            logging.error("Error downloading annotations: %s", str(e))
            return False

I used cavt sdk to download the task, but I found that the export_dataset method downloads image and annotation. If I only want to download annotation, I should use that method?

youth123-c added the bug Something isn't working label May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the issue of using CVAT API to backup task return value 202 #7878

Regarding the issue of using CVAT API to backup task return value 202 #7878

youth123-c commented May 11, 2024 •

edited

PushpakBhoge commented May 11, 2024

youth123-c commented May 13, 2024

bsekachev commented May 14, 2024

youth123-c commented May 14, 2024

Regarding the issue of using CVAT API to backup task return value 202 #7878

Regarding the issue of using CVAT API to backup task return value 202 #7878

Comments

youth123-c commented May 11, 2024 • edited

Actions before raising this issue

Steps to Reproduce

Expected Behavior

Possible Solution

Context

Environment

PushpakBhoge commented May 11, 2024

youth123-c commented May 13, 2024

bsekachev commented May 14, 2024

youth123-c commented May 14, 2024

youth123-c commented May 11, 2024 •

edited