New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dataset downloading #7864
base: develop
Are you sure you want to change the base?
Fix dataset downloading #7864
Conversation
- prolong export cache lifetime on cache requests - change 500 to 404 on missing cache file during downloading - remove rq job removal after result retrieval - make export filenames more distinct - fix possible infinite deferring of dependent rq jobs
WalkthroughThe recent updates to the CVAT application bring enhancements to dataset caching, export lock management, concurrency handling, and testing procedures. Dependencies have been updated and new functionalities added to improve performance and reliability in dataset management within CVAT. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Out of diff range and nitpick comments (3)
cvat/apps/engine/views.py (3)
Line range hint
44-44
: Remove unused import to clean up the code.- import cvat.apps.dataset_manager.views # pylint: disable=unused-import
Line range hint
1966-1966
: Remove unnecessary f-string as it does not contain any placeholders.- server_address = request.scheme + '://' + server_address = str(request.scheme) + '://'
Line range hint
2713-2713
: Remove unnecessary f-string as it does not contain any placeholders.- server_address += request.get_host() + server_address = server_address + str(request.get_host())
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (10)
- cvat/apps/dataset_manager/apps.py (1 hunks)
- cvat/apps/dataset_manager/default_settings.py (1 hunks)
- cvat/apps/dataset_manager/util.py (2 hunks)
- cvat/apps/dataset_manager/views.py (4 hunks)
- cvat/apps/engine/views.py (4 hunks)
- cvat/requirements/base.in (1 hunks)
- cvat/requirements/base.txt (10 hunks)
- cvat/requirements/development.txt (3 hunks)
- cvat/requirements/production.txt (1 hunks)
- utils/dataset_manifest/requirements.txt (2 hunks)
Files skipped from review due to trivial changes (5)
- cvat/apps/dataset_manager/apps.py
- cvat/apps/dataset_manager/default_settings.py
- cvat/requirements/base.in
- cvat/requirements/development.txt
- utils/dataset_manifest/requirements.txt
Additional Context Used
Ruff (4)
cvat/apps/dataset_manager/views.py (1)
27-27:
.util.LockMode
imported but unusedcvat/apps/engine/views.py (3)
44-44:
cvat.apps.dataset_manager.views
imported but unused
1966-1966: f-string without any placeholders
2713-2713: f-string without any placeholders
Additional comments not posted (6)
cvat/requirements/production.txt (1)
13-13
: Update ofexceptiongroup
to1.2.1
appears safe and beneficial.cvat/apps/dataset_manager/util.py (1)
93-146
: Introduced a robust locking mechanism usingRedlock
. Ensure thorough testing, especially for edge cases like rapid lock acquisition and release.cvat/requirements/base.txt (1)
Line range hint
53-325
: Updates to package versions inbase.txt
are minor and likely include improvements or security patches. Verify compatibility with existing codebase.cvat/apps/dataset_manager/views.py (1)
Line range hint
27-149
: Integration of new locking mechanism and error handling in dataset management views appears well-implemented. Recommend additional testing for robustness.cvat/apps/engine/views.py (2)
2962-2962
: Ensure that the new caching logic in_export_annotations
is correctly implemented and tested.
3022-3022
: Confirm that the error handling inexport_annotations
covers all potential failure scenarios effectively.Verification successful
The output from the revised script provides a comprehensive list of exception handling within the
export_annotations
method and related areas in thecvat/apps/engine/views.py
file. The exceptions handled includeValidationError
,PermissionDenied
,NotFound
,AttributeError
,IntegrityError
,APIException
,ValueError
,CloudStorageModel.DoesNotExist
, and generalException
. This indicates a robust error handling mechanism that covers a wide range of potential issues that could arise during the execution of theexport_annotations
method.Given this information, it appears that the error handling within the
export_annotations
method is indeed comprehensive, covering various specific and general exceptions that could occur. This aligns with the initial review comment's concern about ensuring that all potential failure scenarios are effectively covered.Conclusion:
The error handling in theexport_annotations
method appears to be comprehensive and well-implemented, addressing a variety of error conditions that could potentially disrupt the functionality.
- The method handles specific exceptions like
ValidationError
,PermissionDenied
, andNotFound
, which are crucial for robust API behavior.- It also covers database-related errors through
AttributeError
andIntegrityError
.- General exceptions and unexpected errors are caught by broad
Exception
handling.- Special cases like
CloudStorageModel.DoesNotExist
are also considered, which is important for operations involving external storage.This thorough approach ensures that the method can gracefully handle errors and provide appropriate responses, thereby enhancing the reliability and user experience of the dataset exporting functionalities.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check all branches of error handling in the `export_annotations` method to ensure comprehensive coverage. # Test: Search for error handling branches in the method. Expect: Each branch handles a specific error scenario appropriately. rg --type python $'export_annotations'Length of output: 68
Script:
#!/bin/bash # Description: Re-check all branches of error handling in the `export_annotations` method to ensure comprehensive coverage. # Test: Search for error handling branches in the method. Expect: Each branch handles a specific error scenario appropriately. rg 'export_annotations' --files-with-matches | xargs rg 'except'Length of output: 3895
/check |
❌ Some checks failed |
❌ Some checks failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Out of diff range and nitpick comments (6)
cvat/apps/dataset_manager/util.py (1)
Line range hint
16-41
: Consider adding more detailed docstrings for the utility functions to improve maintainability and clarity for other developers.+ """ + Creates a zip archive from a directory. + + Args: + src_path (str): The source directory path. + dst_path (str): The destination zip file path. + """tests/python/rest_api/test_resource_import_export.py (2)
Line range hint
32-57
: Comprehensive test cases for exporting resources to cloud storage. Consider adding comments to explain the test scenarios in detail for better maintainability.+ """ + Test saving resources to cloud storage with specific configurations. + This test covers various object types and ensures that the export functionality + respects the specified cloud storage locations and settings. + """
Line range hint
182-207
: Import functionality tests are well-covered. Ensure that edge cases, such as network failures or partial data availability, are also tested.Would you like me to help by adding some tests for edge cases involving network failures or partial data availability?
cvat/apps/engine/views.py (3)
Line range hint
44-44
: Remove unused import to clean up the code.- import cvat.apps.dataset_manager.views # pylint: disable=unused-import
Line range hint
1967-1967
: The f-string used here does not contain any placeholders, which is unnecessary. Consider using a regular string if no formatting is needed.- server_address = request.scheme + '://' + server_address = str(request.scheme) + '://'
Line range hint
2714-2714
: Similar to the previous comment, an f-string is used here without placeholders. Use a regular string instead.- filename_pattern = build_annotations_file_name( + filename_pattern = str(build_annotations_file_name(
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- changelog.d/20240508_144202_mzhiltso_fix_dataset_downloading.md (1 hunks)
- cvat/apps/dataset_manager/default_settings.py (1 hunks)
- cvat/apps/dataset_manager/util.py (2 hunks)
- cvat/apps/dataset_manager/views.py (4 hunks)
- cvat/apps/engine/views.py (4 hunks)
- cvat/requirements/base.in (1 hunks)
- cvat/requirements/base.txt (10 hunks)
- cvat/requirements/development.txt (1 hunks)
- cvat/requirements/testing.txt (1 hunks)
- tests/python/rest_api/test_resource_import_export.py (5 hunks)
- utils/dataset_manifest/requirements.txt (2 hunks)
Files skipped from review due to trivial changes (1)
- cvat/requirements/testing.txt
Files skipped from review as they are similar to previous changes (4)
- cvat/apps/dataset_manager/default_settings.py
- cvat/requirements/base.in
- cvat/requirements/base.txt
- cvat/requirements/development.txt
Additional Context Used
Ruff (3)
cvat/apps/engine/views.py (3)
44-44:
cvat.apps.dataset_manager.views
imported but unused
1967-1967: f-string without any placeholders
2714-2714: f-string without any placeholders
Additional comments not posted (5)
changelog.d/20240508_144202_mzhiltso_fix_dataset_downloading.md (1)
1-5
: Changelog entry is clear and properly linked to the PR.utils/dataset_manifest/requirements.txt (2)
8-9
: Ensure that non-binary flags forlxml
andxmlsec
are required for your project's needs.Verification successful
The comprehensive search across the codebase confirms the consistent use of
--no-binary
flags forlxml
andxmlsec
across various requirement files and Dockerfiles. This indicates a deliberate and justified choice to compile these libraries from source, likely for compatibility or optimization reasons specific to the project's needs. The presence of related development libraries in Dockerfiles further supports this requirement.
- utils/dataset_manifest/requirements.txt: Both
lxml
andxmlsec
are set to compile from source.- cvat/requirements/: Multiple environment-specific files (base, testing, development, production) also enforce non-binary installation.
- Dockerfile: Includes libraries like
libxmlsec1-dev
andlibxmlsec1-openssl
, which are related toxmlsec
, suggesting system-level dependencies that might require source compilation.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify compatibility of updated package versions. # Test: Search for the package usage in the Python files. Expect: No deprecated functions used. rg --type py 'lxml|xmlsec'Length of output: 28
Script:
#!/bin/bash # Description: Comprehensive search for any references to 'lxml' or 'xmlsec' in the entire codebase. # Search all files for 'lxml' or 'xmlsec' to ensure no usage is missed. rg 'lxml|xmlsec'Length of output: 1026
21-21
: Update totqdm==4.66.4
looks good. Ensure it's tested for compatibility with your project's usage.cvat/apps/dataset_manager/util.py (1)
104-141
: Locking mechanism usingRedlock
is implemented correctly. Ensure thorough testing, especially for edge cases in concurrent environments.cvat/apps/dataset_manager/views.py (1)
179-227
: Robust error handling inclear_export_cache
. Ensure that all possible exceptions are considered and handled appropriately.
cvat/requirements/base.in
Outdated
@@ -38,6 +38,7 @@ patool==1.12 | |||
|
|||
pdf2image==1.14.0 | |||
Pillow>=10.3.0 | |||
pottery==3.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a more relaxed requirement, e.g. ~=3.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you used ~=3.0.0
. Any reason to not make it ~=3.0
? ~=3.0.0
would prohibit 3.1
, for example.
|
||
for key in dir(default_settings): | ||
if key.isupper() and not hasattr(settings, key): | ||
setattr(settings, key, getattr(default_settings, key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not supposed to alter settings outside of a settings file: https://docs.djangoproject.com/en/4.2/topics/settings/#altering-settings-at-runtime
I would suggest a DRF-like approach, where an app has its own settings-like object that proxies the relevant Django settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We researched this earlier in #6204 (comment). Basically, it seems to be the only correct place for this. I don't want to introduce an app-specific way to access the settings like it's done in DRF, as the existing one is the one people got used to. Maybe we could add our own way to access settings in CVAT, but that's outside of the scope of this PR.
|
||
lock = Redlock( | ||
key=export_path, | ||
masters={django_rq.get_connection(settings.CVAT_QUEUES.EXPORT_DATA.value)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably use with
on the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any place where this is done or recommended, so I think, django-rq is supposed to manage open connections internally. rq docs only recommend managing connections manually, but we get these connections from django-rq. One more potential problem here is that the connections can be shared or cached, and doing so could close some connection unexpectedly.
cvat/apps/dataset_manager/util.py
Outdated
if isinstance(acquire_timeout, timedelta): | ||
acquire_timeout = acquire_timeout.total_seconds() | ||
elif acquire_timeout and acquire_timeout < 0: | ||
raise ValueError("acquire_timeout must be a positive number") | ||
elif acquire_timeout is None: | ||
acquire_timeout = -1 | ||
|
||
if isinstance(ttl, timedelta): | ||
ttl = ttl.total_seconds() | ||
elif not ttl or ttl < 0: | ||
raise ValueError("ttl must be a positive number") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this fails to handle the case(s) where acquire_timeout
or ttl
are timedelta
objects and negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
cvat/apps/dataset_manager/views.py
Outdated
cache_ttl = get_export_cache_ttl(instance_type) | ||
|
||
instance_timestamp = None | ||
with suppress(AttributeError): # backward compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the "backwards compatibility" comment? How could an AttributeError
occur here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can come from match.group()
, if match
is None
, i.e. if the file name doesn't contain the added fragment with timestamp. This can happen for jobs created before the patch. I'm not 100% sure backward compatibility is a problem with inmemory redis.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure backward compatibility is a problem with inmemory redis.
FWIW, we do have persistence enabled by default for in-memory Redis, so maintaining backwards compatibility is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good to know. In this case, these efforts make sense. Maybe I will add a test for this.
cvat/apps/dataset_manager/views.py
Outdated
acquire_timeout=EXPORT_CACHE_LOCK_TIMEOUT, | ||
ttl=rq.get_current_job().timeout, | ||
): | ||
if 'job' in file_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems even less robust than get_file_instance_timestamp
. job
, task
, project
are all common words, and I could easily see them occurring in the file path somewhere other than expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved the parsing a little.
cvat/apps/dataset_manager/views.py
Outdated
instance_timestamp = None | ||
with suppress(AttributeError): # backward compatibility | ||
instance_timestamp = get_file_instance_timestamp(file_path) | ||
if instance_timestamp and instance_timestamp != file_ctime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the second half of this condition would always be false, because the file_path
passed to the function is derived from the file_ctime
passed to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, removed the check.
… zm/fix-dataset-downloading
/check |
❌ Some checks failed |
cvat/apps/dataset_manager/util.py
Outdated
raise LockNotAvailableError | ||
|
||
finally: | ||
with suppress(ReleaseUnlockedLock): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could at least log a warning if the lock was unlocked? It seems like it could cause hard-to-debug issues, so some diagnostics would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
cvat/apps/dataset_manager/util.py
Outdated
return f"export_lock:{filename}" | ||
|
||
@contextmanager | ||
def get_dataset_cache_lock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_dataset_cache_lock( | |
def get_export_lock( |
This would be consistent with make_export_lock_key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
# | ||
# This file is autogenerated by pip-compile-multi | ||
# To update, run: | ||
# | ||
# pip-compile-multi | ||
# | ||
-r ../../utils/dataset_manifest/requirements.txt | ||
--no-binary lxml | ||
--no-binary xmlsec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Marishka17 Question for you: how did you originally regenerate the requirements files in #7845? These --no-binary
options are not supposed to appear here (since they aren't in base.in), but they do appear if I regenerate the files myself (and here we see that they appeared when @zhiltsov-max did it, too). How did you get them to not appear?
/check |
❌ Some checks failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Out of diff range and nitpick comments (1)
cvat/apps/engine/views.py (1)
Line range hint
44-44
: Remove unused import.- import cvat.apps.dataset_manager.views # pylint: disable=unused-import
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (12)
- cvat/apps/dataset_manager/default_settings.py (1 hunks)
- cvat/apps/dataset_manager/tests/test_rest_api_formats.py (9 hunks)
- cvat/apps/dataset_manager/util.py (2 hunks)
- cvat/apps/dataset_manager/views.py (3 hunks)
- cvat/apps/engine/tests/test_rest_api_3D.py (3 hunks)
- cvat/apps/engine/tests/utils.py (4 hunks)
- cvat/apps/engine/views.py (4 hunks)
- cvat/requirements/base.in (1 hunks)
- cvat/requirements/base.txt (13 hunks)
- cvat/requirements/development.txt (3 hunks)
- cvat/requirements/testing.in (1 hunks)
- cvat/requirements/testing.txt (1 hunks)
Files not reviewed due to errors (1)
- cvat/apps/dataset_manager/util.py (no review received)
Files skipped from review due to trivial changes (5)
- cvat/apps/dataset_manager/default_settings.py
- cvat/apps/engine/tests/test_rest_api_3D.py
- cvat/requirements/base.in
- cvat/requirements/development.txt
- cvat/requirements/testing.in
Additional Context Used
Ruff (4)
cvat/apps/dataset_manager/views.py (1)
29-29:
.util.EXPORT_CACHE_DIR_NAME
imported but unusedcvat/apps/engine/views.py (3)
44-44:
cvat.apps.dataset_manager.views
imported but unused
1967-1967: f-string without any placeholders
2714-2714: f-string without any placeholders
Additional comments not posted (33)
cvat/requirements/testing.txt (3)
9-10
: Ensure consistency of--no-binary
options across all requirements files.
14-14
: LGTM! The update tofakeredis[lua]
ensures compatibility with Lua scripts.
16-16
: LGTM! The addition oflupa
is appropriate for Lua scripting support.cvat/apps/engine/tests/utils.py (2)
51-91
: LGTM! Theclear_rq_jobs
function is well-structured and includes necessary error handling.
95-96
: LGTM! The_clear_rq_jobs
method improves test isolation and reliability.cvat/requirements/base.txt (5)
9-10
: Ensure consistency of--no-binary
options across all requirements files.
56-56
: LGTM! The update tocryptography
includes important security patches and bug fixes.
74-74
: LGTM! The update todjango
includes important security patches and bug fixes.
105-105
: LGTM! The update todjango-health-check
includes important security patches and bug fixes.
189-190
: LGTM! The addition ofmmh3
andpottery
is appropriate for the new locking mechanisms.Also applies to: 215-216
cvat/apps/dataset_manager/views.py (12)
55-60
: LGTM! The function correctly returns the TTL for different database instances.
60-77
: LGTM! The initial setup for logging and determining the database instance is correct.
77-79
: LGTM! The cache TTL and directory setup is correct.
85-95
: LGTM! The export filename creation and directory preparation are correct.
97-127
: LGTM! The locking mechanism and export process are correctly implemented.
130-137
: LGTM! The exception handling is correctly implemented.
162-164
: LGTM! The initial setup and backward compatibility comment are correct.
166-173
: LGTM! The locking mechanism and file existence check are correctly implemented.
175-193
: LGTM! The cache TTL check and file removal process are correctly implemented.
200-201
: LGTM! The function correctly returns a list of export formats.
Line range hint
203-204
: LGTM! The function correctly returns a list of import formats.
Line range hint
206-210
: LGTM! The function correctly returns a dictionary of importers and exporters.cvat/apps/dataset_manager/tests/test_rest_api_formats.py (8)
11-11
: New imports are necessary for concurrency handling and test enhancements.Also applies to: 17-19, 22-23, 25-25
97-97
: Inheriting_DbTestBase
fromApiTestBase
ensures consistency and consolidates the test base classes.
419-420
: Adding_clear_rq_jobs
ensures a clean test environment and avoids interference from previous tests.
526-527
: Adding_clear_rq_jobs
ensures a clean test environment and avoids interference from previous tests.
613-614
: Adding_clear_rq_jobs
ensures a clean test environment and avoids interference from previous tests.
1265-1590
: TheExportBehaviorTest
class and its methods are well-structured and necessary for testing concurrency handling in export operations.
2058-2059
: Adding_clear_rq_jobs
ensures a clean test environment and avoids interference from previous tests.
2140-2141
: Adding_clear_rq_jobs
ensures a clean test environment and avoids interference from previous tests.cvat/apps/engine/views.py (3)
2963-2964
: LGTM! Ensure the logic for handling cache TTL and instance update time is correct.Also applies to: 2967-2967, 2969-2969
Verification successful
The
_export_annotations
function incvat/apps/engine/views.py
correctly handles the cache TTL and instance update time. The function retrieves the cache TTL usingdm.views.get_export_cache_ttl(db_instance)
and the instance update time usingtimezone.localtime(db_instance.updated_date)
. If thedb_instance
is aProject
, it updates theinstance_update_time
to the maximum update time of its tasks.
- The cache TTL is correctly retrieved and used in the function.
- The instance update time is correctly retrieved and updated if necessary.
Therefore, the logic for handling cache TTL and instance update time is correct.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the logic for handling cache TTL and instance update time. # Test: Search for the function usage. Expect: Correct handling of cache TTL and instance update time. rg --type python $'_export_annotations'Length of output: 69
Script:
#!/bin/bash # Description: Verify the logic for handling cache TTL and instance update time. # Test: Search for the function usage. Expect: Correct handling of cache TTL and instance update time. rg --type py '_export_annotations'Length of output: 1280
Script:
#!/bin/bash # Description: Extract the implementation of `_export_annotations` function from `cvat/apps/engine/views.py`. # Extract the function implementation. ast-grep --lang python --pattern $'def _export_annotations($_, $_, $$$):\n $$$' cvat/apps/engine/views.pyLength of output: 13779
3034-3035
: LGTM! Ensure the logic for handling cloud storage and temporary files is correct.Also applies to: 3036-3036, 3037-3037, 3038-3038, 3039-3039, 3040-3040, 3041-3041, 3042-3042, 3043-3043, 3044-3044, 3045-3045, 3046-3046, 3047-3047, 3048-3048, 3049-3049
3078-3078
: LGTM! Ensure the logic for handling cloud storage and temporary files is correct.Also applies to: 3093-3094
Verification successful
The
_import_project_dataset
function and its usages have been extracted for review. Here are the relevant code segments:Function Definition
def _import_project_dataset(request, rq_id_template, rq_func, db_obj, format_name, filename=None, conv_mask_to_poly=True, location_conf=None): format_desc = {f.DISPLAY_NAME: f for f in dm.views.get_import_formats()}.get(format_name) if format_desc is None: raise serializers.ValidationError( "Unknown input format '{}'".format(format_name)) elif not format_desc.ENABLED: return Response(status=status.HTTP_405_METHOD_NOT_ALLOWED) rq_id = rq_id_template.format(db_obj.pk, request.user)Function Usages
Usage in
import_annotations
method:if request.method in {'POST', 'OPTIONS'}: return self.import_annotations( request=request, db_obj=self._object, import_func=_import_project_dataset, rq_func=dm.project.import_dataset_as_project, rq_id_template=self.IMPORT_RQ_ID_TEMPLATE )Usage in another context:
conv_mask_to_poly = to_bool(request.query_params.get('conv_mask_to_poly', True)) tmp_dir = self._object.get_tmp_dirname() uploaded_file = None if os.path.isfile(os.path.join(tmp_dir, filename)): uploaded_file = os.path.join(tmp_dir, filename) return _import_project_dataset( request=request, filename=uploaded_file, rq_id_template=self.IMPORT_RQ_ID_TEMPLATE, rq_func=dm.project.import_dataset_as_project, db_obj=self._object, )Analysis
- The function
_import_project_dataset
includes logic to handle different dataset formats and raises appropriate errors for unknown or disabled formats.- The usages of the function correctly pass the necessary parameters, including handling temporary files (
uploaded_file
) and cloud storage configurations.Based on the extracted code segments and the analysis, the logic for handling cloud storage and temporary files appears to be correctly implemented.
Final Response
The logic for handling cloud storage and temporary files in the
_import_project_dataset
function and its usages is correctly implemented.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the logic for handling cloud storage and temporary files. # Test: Search for the function usage. Expect: Correct handling of cloud storage and temporary files. rg --type python $'_import_project_dataset'Length of output: 73
Script:
#!/bin/bash # Description: Verify the logic for handling cloud storage and temporary files. # Test: Search for the function usage. Expect: Correct handling of cloud storage and temporary files. rg --type py '_import_project_dataset'Length of output: 356
Script:
#!/bin/bash # Description: Extract the definition and usages of `_import_project_dataset` function for detailed review. # Extract the function definition rg --type py --context 10 'def _import_project_dataset' cvat/apps/engine/views.py # Extract the usages of the function rg --type py --context 5 '_import_project_dataset' cvat/apps/engine/views.pyLength of output: 2662
get_export_cache_dir, make_export_filename, | ||
parse_export_filename | ||
) | ||
from .util import EXPORT_CACHE_DIR_NAME # pylint: disable=unused-import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import EXPORT_CACHE_DIR_NAME
.
- from .util import EXPORT_CACHE_DIR_NAME # pylint: disable=unused-import
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from .util import EXPORT_CACHE_DIR_NAME # pylint: disable=unused-import |
… zm/fix-dataset-downloading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- cvat/apps/dataset_manager/tests/test_rest_api_formats.py (9 hunks)
Additional comments not posted (6)
cvat/apps/dataset_manager/tests/test_rest_api_formats.py (6)
11-23
: New imports for concurrency handling and utility functions.
97-97
: New class_DbTestBase
for database-related test setup and utility functions.
1265-1265
: New classExportBehaviorTest
for testing concurrent export and cleanup operations.
1368-1590
: New test methodtest_concurrent_export_and_cleanup
for testing concurrent export and cleanup operations.
1591-1787
: New test methodtest_concurrent_download_and_cleanup
for testing concurrent download and cleanup operations.
1789-1811
: New test methodtest_export_can_create_file_and_cleanup_job
for testing export file creation and cleanup job scheduling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Out of diff range and nitpick comments (3)
cvat/apps/engine/views.py (3)
Line range hint
44-44
: Remove unused import to clean up the code.- import cvat.apps.dataset_manager.views # pylint: disable=unused-import
Line range hint
1967-1967
: The f-string is used without placeholders, which is unnecessary. Convert it to a regular string.- instance_timestamp = datetime.strftime(instance_update_time, "%Y_%m_%d_%H_%M_%S") + instance_timestamp = datetime.strftime(instance_update_time, "%Y_%m_%d_%H_%M_%S")
Line range hint
2714-2714
: The f-string is used without placeholders, which is unnecessary. Convert it to a regular string.- instance_timestamp = datetime.strftime(instance_update_time, "%Y_%m_%d_%H_%M_%S") + instance_timestamp = datetime.strftime(instance_update_time, "%Y_%m_%d_%H_%M_%S")
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- cvat/apps/engine/views.py (5 hunks)
Additional Context Used
Ruff (3)
cvat/apps/engine/views.py (3)
44-44:
cvat.apps.dataset_manager.views
imported but unused
1967-1967: f-string without any placeholders
2714-2714: f-string without any placeholders
Additional comments not posted (4)
cvat/apps/engine/views.py (4)
2963-2975
: Ensure proper handling of theinstance_update_time
for projects with multiple tasks. The logic to determine the latest update time across all tasks and the project itself is sound.
Line range hint
2976-3055
: The logic to handle outdated exports and re-enqueue jobs is complex but correctly implements the requirements. It checks if the existing job's data is outdated compared to the instance update time and handles it accordingly. This includes canceling outdated jobs and re-enqueuing them if necessary.
3021-3033
: The logic to handle file existence and re-enqueue jobs if the file does not exist is correctly implemented. It ensures that files are re-exported if they are not found, which is crucial for maintaining data integrity.
3039-3055
: Properly handle failed jobs by providing detailed error information to the client. This is crucial for debugging and understanding why a job failed.
return Response( | ||
'A result for exporting job was not found for finished RQ job', | ||
status=status.HTTP_500_INTERNAL_SERVER_ERROR | ||
) | ||
|
||
with dm.util.get_export_cache_lock( | ||
file_path, ttl=60, # request timeout | ||
): | ||
if action == "download": | ||
if not osp.exists(file_path): | ||
return Response( | ||
"The exported file has expired, please retry exporting", | ||
status=status.HTTP_404_NOT_FOUND | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling by specifying the type of HTTP error directly in the response.
- return Response(
- 'A result for exporting job was not found for finished RQ job',
- status=status.HTTP_500_INTERNAL_SERVER_ERROR
- )
+ return HttpResponseNotFound('The exported file has expired, please retry exporting')
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
return Response( | |
'A result for exporting job was not found for finished RQ job', | |
status=status.HTTP_500_INTERNAL_SERVER_ERROR | |
) | |
with dm.util.get_export_cache_lock( | |
file_path, ttl=60, # request timeout | |
): | |
if action == "download": | |
if not osp.exists(file_path): | |
return Response( | |
"The exported file has expired, please retry exporting", | |
status=status.HTTP_404_NOT_FOUND | |
) | |
return HttpResponseNotFound('The exported file has expired, please retry exporting') | |
with dm.util.get_export_cache_lock( | |
file_path, ttl=60, # request timeout | |
): | |
if action == "download": | |
if not osp.exists(file_path): | |
return Response( | |
"The exported file has expired, please retry exporting", | |
status=status.HTTP_404_NOT_FOUND | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- cvat/apps/dataset_manager/tests/test_rest_api_formats.py (9 hunks)
Additional comments not posted (2)
cvat/apps/dataset_manager/tests/test_rest_api_formats.py (2)
17-17
: Ensure proper usage of context managers to handle resources efficiently.Verification successful
Ensure proper usage of context managers to handle resources efficiently
The usage of
ExitStack
andcontextmanager
incvat/apps/dataset_manager/tests/test_rest_api_formats.py
appears to be correct and follows proper resource management practices. Here are the key points:
@contextmanager
Decorators: These are used to define context managers for handling processes and locks, ensuring that resources are properly managed and released.ExitStack
: This is used to manage multiple context managers, ensuring that all resources are properly cleaned up even if an error occurs.The code snippets provided show that these constructs are used appropriately to handle resources efficiently.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of ExitStack and contextmanager in the file to ensure proper resource management. # Test: Search for ExitStack and contextmanager usage. Expect: Proper usage patterns. rg --type python $'ExitStack|contextmanager' cvat/apps/dataset_manager/tests/test_rest_api_formats.pyLength of output: 133
Script:
#!/bin/bash # Description: Verify the usage of ExitStack and contextmanager in the file to ensure proper resource management. # Test: Search for ExitStack and contextmanager usage. Expect: Proper usage patterns. rg 'ExitStack|contextmanager' cvat/apps/dataset_manager/tests/test_rest_api_formats.pyLength of output: 269
Script:
#!/bin/bash # Description: Extract surrounding lines of ExitStack and contextmanager usage for detailed verification. # Extract 5 lines before and after each match to understand the context. rg -C 5 'ExitStack|contextmanager' cvat/apps/dataset_manager/tests/test_rest_api_formats.pyLength of output: 2772
1792-1792
: Ensure proper handling of file usage conflicts during cleanup operations.Verification successful
Ensure proper handling of file usage conflicts during cleanup operations
The search results indicate that
LockNotAvailableError
andFileIsBeingUsedError
are indeed referenced in the test filecvat/apps/dataset_manager/tests/test_rest_api_formats.py
. This suggests that there are mechanisms in place to handle file usage conflicts during cleanup operations.
LockNotAvailableError
is imported and used multiple times.FileIsBeingUsedError
is also imported and used in assertions.These references confirm that the code includes error handling for file usage conflicts.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that file usage conflicts are handled correctly in cleanup operations. # Test: Search for conflict handling in cleanup operations. Expect: Proper error handling mechanisms. rg --type python $'LockNotAvailableError|FileIsBeingUsedError' cvat/apps/dataset_manager/tests/test_rest_api_formats.pyLength of output: 151
Script:
#!/bin/bash # Description: Verify that file usage conflicts are handled correctly in cleanup operations. # Test: Search for conflict handling in cleanup operations. Expect: Proper error handling mechanisms. rg --type py $'LockNotAvailableError|FileIsBeingUsedError' cvat/apps/dataset_manager/tests/test_rest_api_formats.pyLength of output: 1084
def setUp(self): | ||
self.client = APIClient() | ||
|
||
class _DbTestBase(ApiTestBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor _DbTestBase
to improve modularity and readability.
- class _DbTestBase(ApiTestBase):
+ class DatabaseTestBase(ApiTestBase):
Renaming _DbTestBase
to DatabaseTestBase
improves readability and follows Python naming conventions more closely.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
class _DbTestBase(ApiTestBase): | |
class DatabaseTestBase(ApiTestBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this would make this class included in the test suites, it's not what's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhiltsov-max, could you please clarify if there's a specific configuration in your project that includes classes in test suites based on different criteria from the default Python unittest
discovery rules? This information will help ensure that any changes respect your project's testing setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we have 2 types of the server tests - integrated (relying on unittest and django) and separate ones (executed with pytest). The configuration for the 1st group is here https://github.com/cvat-ai/cvat/blob/develop/cvat/settings/testing.py, and the second group is placed here https://github.com/cvat-ai/cvat/tree/develop/tests/python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But overall, we use the default discovery rules for unittest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhiltsov-max, thank you for the clarification. Given that your project uses the default discovery rules for unittest, it's clear why the class name _DbTestBase
was chosen to avoid automatic inclusion in the test suites. Sticking with this naming convention or choosing another name that doesn't start with Test
would be the best approach to ensure it remains excluded from the test suites. If there's anything else you need, feel free to ask!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhiltsov-max, thank you for providing the details about the testing setup. Based on the information gathered:
-
Integrated Tests: The
testing.py
configuration for integrated tests usesDiscoverRunner
from Django, which discovers tests based on the defaultunittest
discovery rules. This means it looks for classes that inherit fromunittest.TestCase
and whose names start with 'Test'. -
Separate Tests: The separate tests executed with
pytest
are located in thetests/python
directory. Pytest is more flexible with test discovery but generally follows similar conventions, looking for files and classes that start with 'test_'.
Given this setup, renaming _DbTestBase
to DatabaseTestBase
could indeed cause it to be included in test suites unintentionally, especially if it matches the naming patterns used for test discovery.
To avoid this issue while still improving readability, we can consider a different name that does not start with 'Test' or 'Database'. For example, BaseDbTest
or DbTestBaseClass
could be alternatives that maintain clarity without affecting test discovery.
Would you prefer one of these alternatives, or do you have another suggestion in mind?
Analysis chain
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Fetch and display the content of the testing configuration for integrated tests
cat cvat/settings/testing.py
# List the contents of the directory for separate tests to understand its structure and any specific configurations
ls -R tests/python
Length of output: 4641
@@ -1265,6 +1262,756 @@ | |||
data_from_task_after_upload = self._get_data_from_task(task_id, include_images) | |||
compare_datasets(self, data_from_task_before_upload, data_from_task_after_upload) | |||
|
|||
class ExportBehaviorTest(_DbTestBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance the ExportBehaviorTest
class with additional comments for better understanding.
+ # This class tests the behavior of concurrent export and cleanup operations.
class ExportBehaviorTest(_DbTestBase):
Adding a comment explaining the purpose of ExportBehaviorTest
helps new developers understand the context quickly.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
class ExportBehaviorTest(_DbTestBase): | |
# This class tests the behavior of concurrent export and cleanup operations. | |
class ExportBehaviorTest(_DbTestBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, this included in the names of the tests. They are not only focused on concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhiltsov-max, thank you for your feedback. You're right; the tests within the ExportBehaviorTest
class address more than just concurrency issues. I'll adjust the comment to better reflect the comprehensive testing approach that includes various behaviors of the export and cleanup operations. Here's the updated comment:
# This class tests various behaviors of export and cleanup operations, including but not limited to concurrency issues.
class ExportBehaviorTest(_DbTestBase):
This should provide a clearer understanding of the class's purpose for anyone reviewing the code.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7864 +/- ##
===========================================
+ Coverage 83.35% 83.39% +0.04%
===========================================
Files 380 382 +2
Lines 40032 40152 +120
Branches 3766 3766
===========================================
+ Hits 33367 33485 +118
- Misses 6665 6667 +2
|
Motivation and context
This PR addresses several problems:
This PR fixes the problems by the following:
finished RQ export jobs are not removed automatically on result retrieval. Now, they just use the export cache lifetime instead (should be continued in another PR)How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Dependencies
pottery
,mmh3
.cryptography
,Django
,jinja2
, and more.Testing