Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc sort improvements #1706

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

q3cpma
Copy link

@q3cpma q3cpma commented May 7, 2024

Hello, first contribution and first time touching Go here, so don't be nice and roast me well if I did some dumb things.

I was looking at the sort function because I want to implement some features and noticed some less than optimal code, be it in performance or "looking nice to me".
I didn't add any dependency (though text isn't indirect anymore), but https://github.com/jeandeaual/go-locale might be a good idea to use the system locale instead of hardcoding English. The other improvement that might be a good idea is to use slices.SortStableFunc, as recommended by Go; though it's only builtin starting with 1.22.

Here are the performance improvements I measured by timing sort and logging
Before:

sort duration for directory /home (numfiles: 1): 8.4µs
sort duration for directory /home/q3cpma (numfiles: 21): 445.601µs
sort duration for directory /home/q3cpma/Programming/externe/lf/build (numfiles: 1): 6.141µs
sort duration for directory /home/q3cpma/Programming/externe/lf/etc (numfiles: 19): 22.72µs
sort duration for directory /home/q3cpma/Programming/externe/lf/gen (numfiles: 4): 14.62µs
sort duration for directory /home/q3cpma/Programming/externe/lf (numfiles: 42): 156.22µs
sort duration for directory /home/q3cpma/Programming/externe/lf/test (numfiles: 100000): 55.838499ms
sort duration for directory /home/q3cpma/Programming/externe (numfiles: 25): 109.811µs
sort duration for directory /home/q3cpma/Programming (numfiles: 44): 286.64µs
sort duration for directory / (numfiles: 18): 31.15µs

After:

sort duration for directory /home (numfiles: 1): 41.52µs
sort duration for directory /home/q3cpma (numfiles: 21): 95.641µs
sort duration for directory /home/q3cpma/Programming/externe/lf/build (numfiles: 1): 12.6µs
sort duration for directory /home/q3cpma/Programming/externe/lf/etc (numfiles: 19): 33.01µs
sort duration for directory /home/q3cpma/Programming/externe/lf/gen (numfiles: 4): 25.06µs
sort duration for directory /home/q3cpma/Programming/externe/lf (numfiles: 45): 103.98µs
sort duration for directory /home/q3cpma/Programming/externe/lf/test (numfiles: 100000): 31.911716ms
sort duration for directory /home/q3cpma/Programming/externe (numfiles: 25): 49.76µs
sort duration for directory /home/q3cpma/Programming (numfiles: 44): 65.4µs
sort duration for directory / (numfiles: 18): 32.79µs

The large test directory contains touch test/{1..100000} as a more throughput oriented benchmark, since the other directories are basically instantaneously sorted.

Copy link
Collaborator

@joelim-work joelim-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your PR submission. Setting aside performance, I think it is a good idea to use this collate package anyway as it makes the code more readable.

That being said, I tried your branch very briefly and found that if dirfirst is disabled then the sorting is skipped entirely since sort.SliceStable is never called. Please do more extensive testing to ensure the various sorting options are covered.

nav.go Show resolved Hide resolved
@q3cpma
Copy link
Author

q3cpma commented May 8, 2024

Thanks for the quick review, and you're right, I was a bit tired when I made this.

nav.go Outdated
Comment on lines 229 to 270
case naturalSort:
sort.SliceStable(dir.files, func(i, j int) bool {
s1, s2 := normalize(dir.files[i].Name(), dir.files[j].Name(), dir.ignorecase, dir.ignoredia)
if !dir.reverse {
return naturalLess(s1, s2)
} else {
return naturalLess(s2, s1)
}
})
lessfun = func(i, j int) bool {
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
}
case nameSort:
sort.SliceStable(dir.files, func(i, j int) bool {
s1, s2 := normalize(dir.files[i].Name(), dir.files[j].Name(), dir.ignorecase, dir.ignoredia)
if !dir.reverse {
return s1 < s2
} else {
return s2 < s1
}
})
lessfun = func(i, j int) bool {
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two cases have the same code, so you should be able to combine them:

	case nameSort, naturalSort:
		lessfun = func(i, j int) bool {
			return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
		}

It's probably also worth putting a comment stating that naturalSort is already handled above.

This actually raises the question of whether natural sorting should have its own option, instead of being lumped under sortby, since it is really a sorting modifier and not a property (e.g. name, size, time) that is used as a basis for sorting. But changing the configuration options is a breaking change, so I think it is not worth worrying about for now.

nav.go Outdated
Comment on lines 348 to 312
// Finally sort after filtering it all
sort.SliceStable(dir.files, lessfun)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought this through, but does it make sense to do all the filtering (i.e. dironly, hidden, filter) first before doing any sorting? It would make the logic cleaner if all of the filtering could be moved to the top of this function, but I don't know if it will cause any issues.

Since you are pretty much rewriting the entire function, it's probably a good time to clean it up as much as possible.

@joelim-work
Copy link
Collaborator

I bumped some dependencies yesterday, looks like it causes a merge conflict.

@joelim-work
Copy link
Collaborator

OK so I decided to play around with this a bit more, and I wrote the following code in a rush, didn't test it too much yet. I wonder if this might be more desirable:

func (dir *dir) sort() {
	dir.sortby = getSortBy(dir.path)
	dir.dirfirst = getDirFirst(dir.path)
	dir.dironly = getDirOnly(dir.path)
	dir.hidden = getHidden(dir.path)
	dir.reverse = getReverse(dir.path)
	dir.hiddenfiles = gOpts.hiddenfiles
	dir.ignorecase = gOpts.ignorecase
	dir.ignoredia = gOpts.ignoredia

	dir.files = dir.allFiles

	filterFiles := func(files []*file, f func(*file) bool) (result []*file) {
		for _, file := range files {
			if f(file) {
				result = append(result, file)
			}
		}
		return result
	}

	if dir.dironly {
		dir.files = filterFiles(dir.files, func(file *file) bool {
			return file.IsDir()
		})
	}

	if !dir.hidden {
		dir.files = filterFiles(dir.files, func(file *file) bool {
			return !isHidden(file, dir.path, dir.hiddenfiles)
		})
	}

	if len(dir.filter) != 0 {
		dir.files = filterFiles(dir.files, func(file *file) bool {
			return !isFiltered(file, dir.filter)
		})
	}

	collopts := []collate.Option{}
	if dir.ignorecase {
		collopts = append(collopts, collate.IgnoreCase)
	}
	if dir.ignoredia {
		collopts = append(collopts, collate.IgnoreDiacritics)
	}
	if dir.sortby == naturalSort {
		collopts = append(collopts, collate.Numeric)
	}
	coll := collate.New(language.Und, collopts...)

	var lessfun func(i, j int) bool

	switch dir.sortby {
	case nameSort, naturalSort:
		lessfun = func(i, j int) bool {
			return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
		}
	case sizeSort:
		lessfun = func(i, j int) bool {
			return dir.files[i].TotalSize() < dir.files[j].TotalSize()
		}
	case timeSort:
		lessfun = func(i, j int) bool {
			return dir.files[i].ModTime().Before(dir.files[j].ModTime())
		}
	case atimeSort:
		lessfun = func(i, j int) bool {
			return dir.files[i].accessTime.Before(dir.files[j].accessTime)
		}
	case ctimeSort:
		lessfun = func(i, j int) bool {
			return dir.files[i].changeTime.Before(dir.files[j].changeTime)
		}
	case extSort:
		lessfun = func(i, j int) bool {
			cmp := coll.CompareString(dir.files[i].ext, dir.files[j].ext)
			return cmp == -1 || cmp == 0 && coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
		}
	}

	if dir.reverse {
		oldlessfun := lessfun
		lessfun = func(i, j int) bool {
			return oldlessfun(j, i)
		}
	}

	if dir.dirfirst {
		oldlessfun := lessfun
		lessfun = func(i, j int) bool {
			if dir.files[i].IsDir() == dir.files[j].IsDir() {
				return oldlessfun(i, j)
			}
			return dir.files[i].IsDir()
		}
	}

	sort.SliceStable(dir.files, lessfun)

	dir.ind = max(dir.ind, 0)
	dir.ind = min(dir.ind, len(dir.files)-1)
}

@q3cpma
Copy link
Author

q3cpma commented May 9, 2024

Thanks, this was actually in my head at some point, but I was a bit wary of doing changes too big with my lack of knowledge of Go and its slices. Good thinking about language.Und, didn't see it.

I included your changes except that I made filtering go through the file list only once.

@q3cpma q3cpma force-pushed the sort-misc-improvements branch 2 times, most recently from f138d49 to abfb1fd Compare May 9, 2024 21:52
@joelim-work
Copy link
Collaborator

I didn't realize it was possible to combine filters in the way you have done, nice.

Anyway the code looks quite good now There's a minor issue with formatting that is causing the CI build to fail, but otherwise it is probably close to being in a state where it can be merged.

Just out of interest how much of an increase in performance is there with the new changes?

@q3cpma
Copy link
Author

q3cpma commented May 10, 2024

Not very different:

Before:

/                                          18      30.99µs
/home                                      2       7.24µs
/home/q3cpma                               73      416.421µs
/home/q3cpma/Programming                   44      369.791µs
/home/q3cpma/Programming/externe           30      133.94µs
/home/q3cpma/Programming/externe/lf        46      101.62µs
/home/q3cpma/Programming/externe/lf/build  1       7.11µs
/home/q3cpma/Programming/externe/lf/etc    19      42.27µs
/home/q3cpma/Programming/externe/lf/gen    4       19.31µs
/home/q3cpma/Programming/externe/lf/test   100000  58.616475ms

After:

/                                          18      26.2µs
/home                                      2       34.72µs
/home/q3cpma                               73      29.87µs
/home/q3cpma/Programming                   44      75.88µs
/home/q3cpma/Programming/externe           30      53.9µs
/home/q3cpma/Programming/externe/lf        45      41.05µs
/home/q3cpma/Programming/externe/lf/build  1       17.97µs
/home/q3cpma/Programming/externe/lf/etc    19      33.35µs
/home/q3cpma/Programming/externe/lf/gen    4       18.66µs
/home/q3cpma/Programming/externe/lf/test   100000  26.947215ms

Basically, a little overhead for small directories, but in the absolute, nothing. Significant gains for the big ones (not even counting the dirfirst skipped with dironly trick or less to sort if a lot is filtered).

@joelim-work
Copy link
Collaborator

Sorry to trouble you, but I did some more testing and I found that the ignorecase and ignoredia options no longer work when they're disabled, at least on my machine. I did some investigation and it seems like collate.New seems to ignore case and diacritics by default.

The following examples all print -1 (i.e. left < right):

c := collate.New(language.Und)
fmt.Println(c.CompareString("a", "B")) // "B" should come first since it is uppercase
c := collate.New(language.Und, collate.IgnoreCase)
fmt.Println(c.CompareString("a", "B"))
c := collate.New(language.Und)
fmt.Println(c.CompareString("é", "f"))  // "é" should come last since it is a special character
c := collate.New(language.Und, collate.IgnoreDiacritics)
fmt.Println(c.CompareString("é", "f"))

I'm not sure how to go about solving this though, I'm haven't used the collate package all that much.

@q3cpma
Copy link
Author

q3cpma commented May 10, 2024

Not troubling me, I'm quite happy you're doing so much testing (I should be doing!).

Filed golang/go#67296 for now, because it sure as hell looks like a bug to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants