Initial Load Memory Grows Exponentially? #503

bxcodec · 2023-11-06T09:50:18Z

PGSync version: master branch commit: 9511670
Postgres version: 14.5
Elasticsearch version: Latest version of Opensearch
Redis version: Latest
Python version: 3.8

Problem Description:

Hi,
I tried to run this in my local.

I have this env set

ELASTICSEARCH_PORT=9200
ELASTICSEARCH_SCHEME=http
ELASTICSEARCH_HOST=opensearch
ELASTICSEARCH=false
OPENSEARCH=true
ELASTICSEARCH_CHUNK_SIZE=1000
QUERY_CHUNK_SIZE=10000

Qns

Why does the memory grow exponentially on my initial load?
Is there any way to make the memory consumption stable?

I have tried to lower the number of the chunk size, e.g., to even 1K and 500, but I am still facing the same issue on the first initial load (?)

Error Message (if any):
Because of OOM, it killed the application

Killed
exited with code 137

The text was updated successfully, but these errors were encountered:

accelq · 2023-11-06T11:01:36Z

The query_chunk_size is not on the Postgres cursor. So SQL Alchemy keeps on pulling the data until it can.

pgsync/pgsync/base.py

Line 867 in 9511670

result = conn.execution_options(

You can add a new env variable to control it
https://github.com/accelq/pgsync/blob/f1d7caa95cf8edb30da03e05172e90bf7775b666/pgsync/base.py#L869

This worked for me though.

bxcodec · 2023-11-08T06:35:43Z

Any plan to add this functionality as a core feature? cc @toluaina

sergiojgm · 2023-11-29T14:58:38Z

I found after debugging with 26MM of records on initial load, If you use es parallel bulk(default)(stream it will exit in case of error) in case of failures/exceptions(ex: different structure potentially on json type between records) with the options ELASTICSEARCH_RAISE_ON_ERROR and ELASTICSEARCH_RAISE_ON_EXCEPTION(both true by default), all errors will make the the failed record resident in memory due unhandled error/exception, continue to process, and pile up until has out of memory. Disable both options and memory will be stable, also review the data structure matches on all records on what is being created on ES. Hope this helps ;) I was able to copy the 26MM records, 250G to es in 3h35m with max usage of 1.6G of ram.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Load Memory Grows Exponentially? #503

Initial Load Memory Grows Exponentially? #503

bxcodec commented Nov 6, 2023 •

edited

accelq commented Nov 6, 2023

bxcodec commented Nov 8, 2023

sergiojgm commented Nov 29, 2023 •

edited

Initial Load Memory Grows Exponentially? #503

Initial Load Memory Grows Exponentially? #503

Comments

bxcodec commented Nov 6, 2023 • edited

Problem Description:

Qns

accelq commented Nov 6, 2023

bxcodec commented Nov 8, 2023

sergiojgm commented Nov 29, 2023 • edited

bxcodec commented Nov 6, 2023 •

edited

sergiojgm commented Nov 29, 2023 •

edited