Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFileOp and PythonFileOp log stdout and stderr output to stdout instead of S3 file #3227

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shalberd
Copy link

@shalberd shalberd commented Apr 25, 2024

/fixes #3222

I am marking this WIP because of a pending answer regarding log output as file and upload to S3, resulting in current lint error in the checks. However, the change as it is now already does solve the issue of no logs appearing in Airflow or KFP for R and Python generic pipeline nodes.

What changes were proposed in this pull request?

Hello! I have a problem with my elyra pipelines not showing logs.
I have looked for issues on github but couldn't find anything related, and i am not sure if I am missing some needed config or if this is a bug.
When I run an elyra pipeline with two nodes, one a jupyter notebook and one a python script, only the jupyter notebook one shows the logging. I am not sure why this is the case as they are both managed the same way.

This error happens in both kfp and airflow pipeline runs. It is related to the bootstrapper.py file that executes the scripts behind generic pipeline nodes, in this case pythonFileOp and RFileOp have similar syntax, subprocess.run, in both kfp and airflow bootstrapper.py

https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L521
https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L332

i.e.
with open(r_script_output, "w") as log_file: subprocess.run(run_args, stdout=log_file, stderr=subprocess.STDOUT, check=True)

We only see the name of the processed file in stdout and no logs in case of python or r nodes.
This change uses the subprocess.run mechanism in such as way that all stderr and stdout output is written to a variable whose content (i.e. the script run output) is then logged.

The stdout content is not logged anymore to S3 bucket storage log output file, or, better said, the file is now empty.
The question remains why Notebook, Python, and R Script run stdout is even put into a file and then uploaded to S3 storage before this change. Right now, a file with empty content is uploaded to S3.

How was this pull request tested?

Added changed bootstrapper.py file for airflow to a runtime image and used it in conjunction with Elyra and Airflow 2.8.2 to see whether the logs appear now in the container log output / in Airflow Logs tab.
Results positive and as intended, documented here
#3222 (comment)
The question as to why script run log output is written back to S3 is posed there and in Slack Elyra general channel as well.

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

… appears in system logs, catch non zero return/exit code in exception, no logging of output to file in S3 COS

Signed-off-by: Sven Thoms <21118431+shalberd@users.noreply.github.com>
@shalberd shalberd added status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines labels Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No logging in container for python scripts nodes
1 participant