Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The data structure problem with active objects #14613

Open
sfan5 opened this issue May 5, 2024 · 16 comments · May be fixed by #14631
Open

The data structure problem with active objects #14613

sfan5 opened this issue May 5, 2024 · 16 comments · May be fixed by #14631
Labels
Bug Issues that were confirmed to be a bug Performance @ Server / Client / Env.

Comments

@sfan5
Copy link
Member

sfan5 commented May 5, 2024

There's a closely related older issue (#6453) but I felt like a fresh start here was better.

Problem

tl;dr: optimizing anything else server-side is basically a waste until this is solved, skip to the next section.

The nice people from the Your Land server sent me several profiling traces of their production server and a common theme is that about 40% of time (on the server thread) is spent just iterating the entity list (average, not peak!).
Now their mods use get_objects_inside_radius a bunch, but this is not a sole cause and I expect any sizeable server will be plagued with this exact issue.

The issue are spatial queries on the entity list, which happen more often than one might think:

  • Lua get_objects_inside_radius, get_objects_in_area
  • Entity collision detection (sometimes)
  • Raycasts (sometimes)
  • ServerEnvironment::getAddedActiveObjects

All of these are O(n) with n being the number of entities. When you consider that every entity does collision detection this becomes O(n²) right in the main thread.

Solution requirements

What we need is a new data structure.

It must:

  • support fast lookup by entity ID
  • be iterable in the order of entity ID
  • support deletion and insertion
  • support fast spatial AABB queries (= getObjectsInArea) on the entity position
  • efficiently handle entity position updates, as these happen often
  • position updates must reflect in lookups instantly

external circumstances:

  • there is almost always opportunity for positions to change between lookups
    • this means any kind of "one-off cache" (that needs to be thrown away once an update happens) is unlikely to produce any hits
  • spatial distance queries (= getObjectsInsideRadius) can be emulated on top of AABB queries and are not an important point
  • being safe from modification during iteration is also important (Use performant and safe data structure for active objects #13880) but I think we can tack that on if the rest is adressed

Finally we will also need to figure out how to reliably propagate position updates up to the container.
There are many ideas in the form of "just add a cache to <thing>" that come to mind, but keeping it in sync is generally the trickier part than designing the structure.

Real-world example data

(thanks to @Bastrabun)
There are 4500 entities. About 1400 move often, 3100 are static.

@appgurueu
Copy link
Contributor

appgurueu commented May 5, 2024

I am currently working on a solution to this. The data structure I'm implementing is a dynamization of k-d-trees, meaning it would store O(log n) many k-d-trees with sizes of powers of two. This means:

  • Amortized O(log n) insertions and O((log n)²) deletions
  • O(log(n) * n^(2/3) + k) worst case range queries (assuming distinct object positions) where k is the number of results, average case should be better
    • To give an example of how good the performance can be, a near-point-query should have O((log n)²) time complexity.
  • Raycast queries can get special optimization. The basic "bounding area" optimization also helps a bit.

Josiah and I tried optimizing this by using an off-the-shelf R-tree library a while ago. This failed due to bad constant factors for our medium-size workloads. It's also a bit overkill: It is centered around boxes. It would have made supporting higher variance in box dimensions easier. But currently we don't need that; we can just approximate the relatively small objects as points for now.

I'm optimistic that with this new approach, I have a good chance of getting the constant factors low enough since there are plenty of knobs to tweak:

  • There are three possible ways to construct k-d-trees;
  • I can use a vector for up to, say, 32 or 64 entity ID - position pairs, and only use k-d-trees for more entries;
  • There are various ways of tweaking the "dynamization" data structure. For example we could use only a single k-d-tree, which would worsen updates to O(sqrt n log n), but improve queries to O(sqrt n + k) worst case (and much better in average case). This would require k-d-trees to be able to get "out of shape" though, so storing them in arrays wouldn't be an option anymore; we would need more allocations.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 5, 2024

I've had an idea tossing around for a while now on the subject, it would need optimized but generally:

Object ID vector of pointers (std vector)

Std Vector of unused entries

Std unordered map with three levels:

Each level represents an Octree subdivision,

With the bottom layer corresponding with the 16,16,16 mapblock boundary (to align with when entities need to be put to sleep or awoken). This bottom layer is two std::vector()s to hold unlimited entities, same on-erase mechanics as the main std vector list of objects.

So, you have two lookup methods: by Octree bin (basically x,y,z) and by id in a vector.

When insertions occur in the vector, they are mirrored in the unordered maps. When positions updates occur, just need to verify we haven't changed mapblock, if we have, erase our entry in that mapblock vector and insert into the new mapblock vector. So the tradeoff is position updates are more expensive.

Finally, the main unordered map stores pointers, at least at the highest level, so as to keep RAM usage down. When I last looked at it, I think it was about 15 MB for the whole optimization stack at startup, with a couple kilobyte allocations when the first entities are inserted to mapblocks.

Note: will be updating this comment as I do an actual design on the task - might be able to actually do this one.

Update 1: Nice we already have an object manager class using a std::map structure with garbage collecting, so that takes care of my first vector manager. Wonder why we need to use maps? Like, couldn't the vector index be the object id?

Update 2:

The locations that for loop over all objects at ton during runtime are:

In activeObejctManager.cpp
// Get vector sorted by distance around origin point within max distance
ActiveObjectMgr::getActiveObjects(const v3f &origin, f32 max_d, std::vector<DistanceSortedActiveObject> &dest)

// Raycasts 
ActiveObjectMgr::getActiveSelectableObjects(const core::line3d<f32> &shootline)

// objects inside radius
ActiveObjectMgr::getObjectsInsideRadius(const v3f &pos, float radius, std::vector<ServerActiveObject *> &result, std::function<bool(ServerActiveObject *obj)> include_obj_cb)

// objects inside area
ActiveObjectMgr::getObjectsInArea(const aabb3f &box, std::vector<ServerActiveObject *> &result, std::function<bool(ServerActiveObject *obj)> include_obj_cb)

// Maintain Inactive/Active Objects
ActiveObjectMgr::getAddedActiveObjectsAroundPos(v3f player_pos, f32 radius, f32 player_radius, const std::set<u16> current_objects, std::vector<u16> &added_objects)
 
 

@sfan5
Copy link
Member Author

sfan5 commented May 5, 2024

So I took a walk and here's my idea:

It bases around the "one-off cache" approach I said wouldn't work but uses some tricks to make it workable.
It also presumes a reliable mechanism exists by which the container is notified when an object changes position.

Keep the ModifySafeMap<u16, SAO> as base data structure.
A cache map is created as a std::multimap<float = Z pos of object, u16 = object ID>.
A list/vector/set of object IDs is created called uncached.

The gist of it is that objects exist in either cached (fast lookup) or uncached (linear lookup as before). Since there is no requirement that the cache is accurate at all times we can cleverly avoid (defer) rebuild costs.

When an object is inserted it is put into the uncached list.
When an object is deleted it is removed from the uncached list.
When an object changes position it is added to the uncached list.

The cache has to be (re-)created at some point. This should happen when the uncached list gets too large, let's say if it's at least 64 entries and >=10% of all objects.

When an AABB query happens the cache is consulted (which speeds up the query, albeit only in one dimension). The uncached map is also consulted and takes priority over what the cache returned for a given object ID.

spatial lookup performance: O(log(n) + m) (n=cached.size(), m=uncached.size())
the cache rebuild that would need to happen once in a while is O(n * log n) (n=all objects)


let's go through the checklist:

  • lookup by ID: defer to base structure
  • iterable in ID-order: defer to base structure
  • deletion and insertion: no problem, described above
  • spatial queries: yep, this is what the cache is for
  • efficiently handle updates: virtually no cost
  • updates apply instantly: yes

bonus:

  • safe from modification during iteration: no iterator needs to be held of the two new structures + the base structure is already safe
  • bookkeeping does not require knowing from where objects have moved, just that they have moved

caveats:

  • doesn't work if too many objects constantly move

If you read closely you will notice that the requirements for the cache structure are very minimal.
It literally only needs to:

  • be constructable
  • supports spatial queries

It can be transparently replaced with any other — better — data structure.

@Bastrabun
Copy link

Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ...

For reference YL bugtracker 3723, 6325, 6326, 5289, administration 188

@ExeVirus
Copy link
Contributor

ExeVirus commented May 5, 2024

Sfan: Why is just Z-position enough to cache? Does that speed up any of our lookups by position by enough to matter?

Also, why are position updates no cost?

@sfan5
Copy link
Member Author

sfan5 commented May 5, 2024

Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ...

I think the most likely next step when someone has working code is for you to test the performance in a real situation.
What would be useful to know right now however is: how many active objects do you have usually?

Sfan: Why is just Z-position enough to cache?

It may not be, but see the last section.

Also, why are position updates no cost?

After updating an object the cost is paid during every following spatial query, that's the m component in the big-O notation.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 5, 2024

Sfan: Why is just Z-position enough to cache?

It may not be, but see the last section.

Oh, that makes perfect sense. Okay yeah I can see a way to get my idea to fit in your world relatively painlessly.

Just take the int16(x) >> 4 | int16(y) >> 4 | int16(z) >> 4. That will put all entities into mapblock sized bins and easily fit in 64 bits.

Should still yield nice speedups for objects in radius, raycast, and whatnot, just gotta make the algorithms for those lookups. Could be as simple as running the same algorithms at 16x16x16 size first and iterate over those relevant bins instead of the whole active object list like we currently do.

Bin size should be played with, obviously. Because 1x1x1 bins wouldn't yield much value I suspect.

@fluxionary
Copy link
Contributor

i tried to take a stab at a better data structure for this. the result, in lua, is here https://github.com/fluxionary/minetest-futil/blob/main/data_structures/point_search_tree.lua. the biggest issue with it is that it can't be re-balanced when new points are added or removed. i found a paper about that but never got around to implementing it https://arxiv.org/abs/1410.5420

@appgurueu
Copy link
Contributor

I also have a Lua implementation of k-d-trees. I think trying to rebalance k-d-trees is a dead end. I tried for a while, it doesn't seem to really be feasible to me.

But it is possible to build an efficient dynamic data structure out of a forest of static k-d-trees (see "transforming static data structures to dynamic structures"), which is what I'm suggesting here and currently implementing. I can't tell whether this will hold up in real world usage, but implementing and testing it is the only way to find out. I'm optimistic that we can get decent constant factors because there are plenty of knobs to tweak.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 6, 2024

Thought through sfan's idea more (worked on it). There's also a remove (when object is no longer active) that is also O(log(n) + m)

@appgurueu
Copy link
Contributor

appgurueu commented May 6, 2024

Alright, I've got a seemingly working (according to a quick randomized test against a naive implementation) dynamic forest of k-d-trees data structure whipped up, feel free to take a peek at my branch. The big O for this should be good (though not quite as good as what I advertised initially; for example deletion is amortized O(log(n)²) rather than O(log n) with this implementation), what remains to be seen is how it holds up in practice (potentially after some optimization), and integration into the active object manager, which I may get to on the weekend.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 7, 2024

Will take a look, sfans idea was short enough that I just jumped right into server::active object mgr integration, the only not straight forward thing is how to signal object position updates. In mine I just check if last position is different from the new position after step().

I will probably not write tests for your structure directly, as MT is a much better stress test of complexity than I can imagine haha, but I can at least look for obvious things.

Not quite done with my version, but perhaps ready for integration tests against servers/mods by end of week.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 8, 2024

Okay, first draft version working correctly? I'm certain I have missed spots where an object position changes and I need to invalidate the cache, such as object:set_pos(), but I have data to back up that I have a working solution, albeit only 10% optimized.

Here is the branch:

I am testing with a custom mod with devtest, this is the init.lua:

minetest.register_chatcommand("foo", {
    privs = {
        interact = true,
    },
    func = function(name, param)
        for i = 1, 100 do
            local z = i-50
            local obj = minetest.add_entity({x = i-50, y = 20, z = z}, "entity_test:test_entity", nil)
            obj:set_hp(z+1)
        end
        return true
    end,
})

local function getObjectsInAreaTest()
    local before = minetest.get_us_time()
    local list = 0
    for i = 1, 10 do
        list = minetest.get_objects_in_area({x=0,y=0,z=-50},{x=50,y=50,z=0})
    end
    local after = minetest.get_us_time()
    minetest.chat_send_all(#list .. "   " .. after-before)
    minetest.log(#list .. "," .. after-before)
end

local function callEveryHalfSecond()
    getObjectsInAreaTest()
    minetest.after(0.5, callEveryHalfSecond) -- Schedule itself to run again in 0.5 seconds
end

minetest.register_chatcommand("getList", {
    privs = {
        interact = true,
    },
    func = callEveryHalfSecond,
})
  
  -- Define the test_entity (optional, replace with your desired entity definition)
minetest.register_entity("entity_test:test_entity", {
    description = "Test Entity",
    initial_properties = {
        hp_max = 200,
        physical = true,
        collide_with_objects = false,
        collisionbox = {-0.3, -0.3, -0.3, 0.3, 0.3, 0.3},
        visual = "wielditem",
        visual_size = {x = 0.4, y = 0.4},
        textures = {""},
        spritediv = {x = 1, y = 1},
        initial_sprite_basepos = {x = 0, y = 0},
    },
    on_step = function(self)
        local pos = self.object:get_pos()
        pos.z = -50 + math.fmod(minetest.get_us_time()/30000+self.object:get_hp()-51, 100)
        self.object:set_pos(pos)
    end,
})
  1. I start new devtest world, flat mapgen
  2. Go to 0,10,0
  3. /clearobjects
  4. /foo (creates the 100 cycling entities)
  5. /getList (every half second, spits out the number of entities found and how long it took to find them all 10x in a row)

Here is the gif of what that test mod and integration test looks like in-game:

101 Entity Test

Then, I went ahead and plotted the results:
image

You'll notice that even though there are 101 Entities flying around, the actual time to getObjectsInArea 10 times is varying significantly based on where they are in relation to the getObjectsInArea's AABB.

Take a look at my implementation, I just always check the mapblocks to that max AABB extents, so it's sub-optimal where we are requesting a spherical radius, because I still just check all mapblocks from -radius to radius in all 3 axis centered around the origin point.

Still, surprised it was only about ~80 Loc for the structure. (also notice I set the mapblock size as my bin size. Perhaps smaller than 16, like 8, might be better. I wouldn't know without field testing (and it'll be game specific, actually)

@appgurueu appgurueu linked a pull request May 9, 2024 that will close this issue
@Desour
Copy link
Member

Desour commented May 11, 2024

Some feedback / thoughts:

  • No need to be perfect in terms of performance. Any solution is acceptable for now, as it will outperform the current situation anyways.
  • luatic: You could use a bounding volume hierarchies (BVHs) in place of of k-d-trees.
    A big benefit would be that you can do refitting, instead of removing and inserting an object whenever it moves a tiny bit.
    For the BVH construction you can, for example, do the same splitting as for k-d-trees, you just have to also keep track of the bounding boxes.
    You don't have to be worried about the overlap: With k-d-trees you have to make your query AABB bigger by the upper bound of all object volumes. This makes the k-d-tree nodes effectively BVH nodes that always overlap by this upper bound. Actual BVH nodes can't have more overlap than that.
  • Do we actually currently have an upper bound for object collision/selection boxes? Because IIRC for selection boxes the answer is no. (Edit: Oh right, we're just worried about collision boxes.) Please make sure to properly support big objects, e.g. in the worst case by inserting them into a separate vector if they're too big.
  • You can do tree rebuilding in a separate thread, to get it out of the hot path.
    This would go especially well with sfan5's approach (which, if I didn't misunderstand, is basically a non-hierarchical version of what luatic does (aka what the linked paper describes)).
  • I do like ExeVirus's approach of using a hash map to mapblock-sized boxes. The good thing is that it makes the run time independent of total number of objects, instead it only depends on the number of objects nearby.
    However, idk yet how the best way would be to make it work well in arbitrary situations, like very big objects (need to update membership in so many mapblocks), very big query boxes (need to look into so many mapblocks), or one mapblock containing many objects (would be good to have a data structure in there too then, but which?). I'm thinking of something like an octree with a hashmap at each depth. But I can't come up with anything hierarchical / working well in generic cases, that is also simple.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 11, 2024 via email

@ExeVirus
Copy link
Contributor

Alrighty, there's my current merge^^^

Here's the performance results:
performance

Summary for the lazy:

  • Exactly the performance gains you'd expect from a spatial map: scales fantastically when getting entities in radius/range (which happens for all collisions, I found out).
  • Has no effect on object creation - it is dwarfed by the cost to allocate memory for the new object
  • Removing existing objects is slower now than I expected. Not too expensive, but still more - I suspect I am not reading everything about deletion right, and trying to double delete entries or something. Still doesn't really matter in the grand scale of things: it's still blazing fast compared to entity creation.
  • When updating positions of all 10,000 entities to completely random far away locations, the performance of those updates has dropped, taking ~40% longer than before. Though, I will point out that seldom do 10,000 entities will change 16x16x16 mapblocks every frame. We would need some sort of small-move benchmark to see those speeds, but either way it should have extremely similar to current performance under normal circumstances, with a worst case of 40% as seen here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Issues that were confirmed to be a bug Performance @ Server / Client / Env.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants