[Feature] Faster, if unsafe, dbt-compile please! (perhaps without connectors) #10155

guyr-ziprecruiter · 2024-05-16T11:50:59Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

A simple dbt compile run may take several minutes. I gather this is since the connectors are applied and things are verified for correctness (e.g. are the tables and columns referenced really are there). That's wonderful! But if I'm running the same dbt-compile multiple times, I'm willing to take the risk and assume nothing has changed. Hence I suggest a run mode in which whatever collected by the connectors gets cached, and I can reuse it instead of waiting 3 minutes every time I try to compile my model to Athena. Thanks!

Describe alternatives you've considered

Writing a jinja2 template rendered myself, and using jinja2 without dbt. Both did not work.

Who will this benefit?

DBT users compiling their queries to test (e.g. in the Athena console) and wishing not to spend hours waiting (it adds up!).

Are you interested in contributing this feature?

Not impossible if no one better is willing to!

Anything else?

No response

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2024-05-16T13:04:44Z

Hey @guyr-ziprecruiter!

First: Are you regularly running dbt compile for all the models in your project? If so, what's the reason why you're doing this? I wonder if you're running compile when a simple parse might do, or if you instead want to just do something like dbt compile --select specific_model.

As far as your proposal:

We already support an "unsafe" mode that disallows introspective queries against the data warehouse during compilation. It never makes a connection, but it will also fail gracelessly if you have code attempting to do this: dbt compile --no-introspect
The idea of caching DWH responses to use for subsequent queries is one that we've been experimenting with recently, for a different use case: our own testing/validation of dbt-core/adapters. We haven't documented this yet, because it's still in early development and far from stable — but it could offer some components that are extensible to what you're describing, or at least entry-points that might inspire your own implementation. We'll write more about this in the coming months.

guyr-ziprecruiter · 2024-05-19T06:39:13Z

Hello again!
I need the SQL query created by the model. So dbt parse is not applicable in my use case if I'm not mistaken.
I switched to --no-introspect:

$ dbt compile --no-introspect --profiles-dir=profiles/ --profile=athena --models foo

But it still took a couple of minutes. So seems like that's not what I was looking for.

Ideally I would like to work with no connector, just get the jinja templates to render and get my query back. Is that possible?

Thanks!

jtcohen6 · 2024-05-21T16:51:14Z

@guyr-ziprecruiter Thanks for giving it a try! In that case, I'm not sure what the source of slowness would be. Two questions:

How many --threads are you running dbt compile with?
How many models are being selected by --models foo?
If you take a look in the debug-level logs (logs/dbt.log), what does it look like dbt is spending most of the time doing?

guyr-ziprecruiter · 2024-05-22T08:52:53Z

Threads:

I'm not using --threads. My profiles.yml says 4 threads.

python -c 'from multiprocessing import cpu_count as cc;print(cc())' says 12.

This did seem to help a bit - thanks!

It might be nice if dbt compile would suggest an better number (if there is such a way to tell it).

How many models?

grep -c '{{ ref(' foo.sql gives 3. But how many models do these models refer, recursively? I can't tell. That might actually be a nice thing the tool can suggest.

Logs:

I executed:

$ dbt --debug --log-cache-events --log-path ~/dbtlogs/ compile \
     --threads 12 \
    --profiles-dir=profiles/ --profile=athena \
    --models itm_applies_revenue_attribution > /dev/null

and got me 55MB worth of logs (8.1MB gzipped). So let me get back to you about it :)

Again - a built in profiler seems like a worthy addition if these are the questions arising. I'll post my results soon. Thanks!

guyr-ziprecruiter · 2024-05-22T12:25:24Z

I wrote a dbt logs parser in Python (dbtlogs.py.txt) and came up with this:

action	estimated_duration	percent
dump after adding	0 days 00:00:04.261488	49.64%
adding relation	0 days 00:00:02.935445	34.19%
dump before adding	0 days 00:00:01.375737	16.02%
Connection	0 days 00:00:00.006651	0.08%
Timing info for model.main...	0 days 00:00:00.001911	0.02%
Sending event	0 days 00:00:00.001520	0.02%
Compiled node	0 days 00:00:00.001466	0.02%
Command	0 days 00:00:00.000737	0.01%
Concurrency	0 days 00:00:00.000382	0.0%
total	0 days 00:00:08.585337	100.0%

Not sure if this doesn't represent some kind of cached run where things happen faster.
Also, due to the logs structure I had to take some liberties - pick the action names by some heuristic, and assume the delta between each two consecutive events is the prior event's duration (added 0 as the last event's duration).

guyr-ziprecruiter · 2024-05-23T15:54:17Z

That was actually pretty dumb of me, I could just --log-format json.

Or could I...?

Trying to run with --log-format json gives me:

Traceback (most recent call last):
  File "/Users/guyr/miniconda3/envs/sections/lib/python3.10/site-packages/dbt/events/functions.py", line 217, in msg_to_dict
    msg_dict = MessageToDict(
TypeError: MessageToDict() got an unexpected keyword argument 'including_default_value_fields'

During handling of the above exception, another exception occurred:
(repeat until `RecursionError: maximum recursion depth exceeded`)

Attached is the output file: dbt-log-json-format-stderr.txt.gz.

Since locale.getlocale(locale.LC_TIME) is mentioned, here's also the output for locale on my machine:

$ locale
LANG=""
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Thanks!

dbeatty10 · 2024-05-30T13:42:30Z

@guyr-ziprecruiter could you open a bug report for the "maximum recursion depth exceeded" you mentioned in #10155 (comment) ? That way we can handle that unexpected error separately than your feature request.

I gave dbt compile --log-format json a quick attempt using dbt-core==1.8.0 and dbt-postgres==1.8.0 and it worked fine, so it must be something specific to your versions and/or environment.

guyr-ziprecruiter · 2024-06-09T17:39:49Z

Will do. For the meantime, let me rephrase my question differently:

During a dbt compile run on my laptop, the connectors are used, making every call slow and pretty demanding. Is there a way whatever is collected by the connectors become cached, allowing to run a consecutive call (say 1 second after the first one finished) end much faster?

guyr-ziprecruiter · 2024-06-09T17:41:31Z

@dbeatty10 my versions seem to be:

$ pip list | grep -i dbt
dbt                               1.0.0.35.1
dbt-artifacts-parser              0.5.1
dbt-athena-community              1.5.1
dbt-core                          1.5.0
dbt-duckdb                        1.5.1
dbt-extractor                     0.4.1
dbt-semantic-interfaces           0.4.2
dbt-spark                         1.5.0
dbterd                            1.12.0
sqlfluff-templater-dbt            2.3.5

So I believe this bug has already been fixed - it's my bad working with older versions 🤷

guyr-ziprecruiter added enhancement New feature or request triage labels May 16, 2024

jtcohen6 added awaiting_response and removed triage labels May 16, 2024

github-actions bot added triage and removed awaiting_response labels May 19, 2024

jtcohen6 added awaiting_response and removed triage labels May 21, 2024

github-actions bot added triage and removed awaiting_response labels May 22, 2024

dbeatty10 assigned jtcohen6 May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Faster, if unsafe, dbt-compile please! (perhaps without connectors) #10155

[Feature] Faster, if unsafe, dbt-compile please! (perhaps without connectors) #10155

guyr-ziprecruiter commented May 16, 2024

jtcohen6 commented May 16, 2024

guyr-ziprecruiter commented May 19, 2024

jtcohen6 commented May 21, 2024

guyr-ziprecruiter commented May 22, 2024

guyr-ziprecruiter commented May 22, 2024 •

edited

guyr-ziprecruiter commented May 23, 2024

dbeatty10 commented May 30, 2024

guyr-ziprecruiter commented Jun 9, 2024

guyr-ziprecruiter commented Jun 9, 2024

[Feature] Faster, if unsafe, dbt-compile please! (perhaps without connectors) #10155

[Feature] Faster, if unsafe, dbt-compile please! (perhaps without connectors) #10155

Comments

guyr-ziprecruiter commented May 16, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

jtcohen6 commented May 16, 2024

guyr-ziprecruiter commented May 19, 2024

jtcohen6 commented May 21, 2024

guyr-ziprecruiter commented May 22, 2024

guyr-ziprecruiter commented May 22, 2024 • edited

guyr-ziprecruiter commented May 23, 2024

dbeatty10 commented May 30, 2024

guyr-ziprecruiter commented Jun 9, 2024

guyr-ziprecruiter commented Jun 9, 2024

guyr-ziprecruiter commented May 22, 2024 •

edited