Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make filer compatible with s3 when handling /buckets/../.. paths (an entry can be both a file and a directory) #4331

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

shichanglin5
Copy link
Contributor

@shichanglin5 shichanglin5 commented Mar 21, 2023

What problem are we solving?

Due to filer's mandatory compatibility with the path specification of posix, many files (both object files and logical directories in ceph s3) cannot be migrated from ceph to seaweed.

When migrating these files, a response of 409 (prompt 'existing xxx is a directory' or 'existing xxx is a file') will be generated, which is very frustrating

How are we solving the problem?

Make filer compatible with s3 when handling /buckets/../.. paths (an entry can be both a file and a directory)

How is the PR tested?

Checks

  • I have added unit tests if possible.
  • I will add related wiki document changes and link to this PR after merging.

@github-advanced-security
Copy link

You have successfully added a new CodeQL configuration .github/workflows/codeql.yml:CodeQL-Build. As part of the setup process, we have scanned this repository and found 47 existing alerts. Please check the repository Security tab to see all alerts.

@chrislusf
Copy link
Collaborator

chrislusf commented Mar 21, 2023

  • Prefer not to introduce one more option. This option is only for object storage. It can be auto turned on if the file is in a bucket, where the prefix has /buckets/.../.
  • This approach is using one entry for both a directory and a file. Listing directory, deleting files, and maybe some other operations, are tricky.

…entry can be both a file and a directory)

Signed-off-by: changlin.shi <changlin.shi@ly.com>
@shichanglin5 shichanglin5 changed the title Add the ForPosixPathStyle option, default to true. When set to false, filer supports a key corresponding to either a path or a directory entry. This makes filer compatible with the path organization method used by S3 Make filer compatible with s3 when handling /buckets/../.. paths (an entry can be both a file and a directory) Mar 22, 2023
@shichanglin5
Copy link
Contributor Author

shichanglin5 commented Mar 22, 2023

at weed/pb/filer_pb/filer_pb_helper.go:20

func (entry *Entry) IsDirectoryKeyObject() bool {
	return entry.IsDirectory && entry.Attributes != nil && entry.Attributes.Mime != ""
}

我被这段代码误导了,所以现在的逻辑是当 entry.IsDirectory==true 时通过 Mime!="" 判断是否同时是一个 s3文件,但这是有问题的,因为通过 S3 PubObject 上传的文件可能不包含 ContentType,导致Mime 为空字符串,这种情况下,如果对应的 entry 已存在且是个目录的话,无法通过 Mime 判断这个entry是否是 s3 文件;

为了能正确的识别一个directory entry 是否是 s3文件,是否可以加个 S3专用的 fs.FileType

var FileModeS3 fs.FileMode = 1 << (32 - 14) // 偏移的offset设置为14是为了和 fs 的其他 FileMode 区分开

当一个 entry 已经是 filer directory的时候,通过 entry.Mode |= FileTypeS3 来标识该 entry 是个 s3 文件

@shichanglin5
Copy link
Contributor Author

@chrislusf

@shichanglin5
Copy link
Contributor Author

at weed/pb/filer_pb/filer_pb_helper.go:20

func (entry *Entry) IsDirectoryKeyObject() bool {
	return entry.IsDirectory && entry.Attributes != nil && entry.Attributes.Mime != ""
}

我被这段代码误导了,所以现在的逻辑是当 entry.IsDirectory==true 时通过 Mime!="" 判断是否同时是一个 s3文件,但这是有问题的,因为通过 S3 PubObject 上传的文件可能不包含 ContentType,导致Mime 为空字符串,这种情况下,如果对应的 entry 已存在且是个目录的话,无法通过 Mime 判断这个entry是否是 s3 文件;

为了能正确的识别一个directory entry 是否是 s3文件,是否可以加个 S3专用的 fs.FileType

var FileModeS3 fs.FileMode = 1 << (32 - 14) // 偏移的offset设置为14是为了和 fs 的其他 FileMode 区分开

当一个 entry 已经是 filer directory的时候,通过 entry.Mode |= FileTypeS3 来标识该 entry 是个 s3 文件

当一个 entry 已经存在,且 entry.IsDirectory() 为true 的话,那么当该 entry 关联的 url 上传文件后会将 entry的mime 属性更新,默认为application/octor-stream

@shichanglin5 shichanglin5 marked this pull request as ready for review March 26, 2023 08:07
if mime != "" {
e.Mime = mime
} else {
e.Mime = "application/octet-stream"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this incurs unnecessary storage cost

@@ -309,3 +310,7 @@ func processMetadataBytes(reqHeader http.Header, existing map[string][]byte, rep

return
}

func IsS3FileEntry(entry *filer_pb.Entry) bool {
return !entry.IsDirectory || (entry.Attributes != nil && entry.Attributes.Mime != "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 directories has mime as "httpd/unix-directory"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is mime of s3 directory set to "httpd/unix-directory"? directory is only a logical concept in s3 api design
Here is to consider how to correctly handle s3 files when entry is both s3 files and directories, because directories do not have mime, and s3 files are generally mime (if not, they need to be set to the default application/octorstream). In this way, when an entry is a directory, we can know whether the entry is a file at the same time through mime.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmlebedev any comments about setting the mime of s3 directory to "httpd/unix-directory"?

Signed-off-by: changlin.shi <changlin.shi@ly.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants