Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Airflow package catalog connector for Airflow 2.x wheel, make import of all core operators possible, parsing changes #3208

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shalberd
Copy link

@shalberd shalberd commented Jan 15, 2024

fixes #2124

@nanaones unsure if airflow package catalog connector was a feature in 2021 already, it is now ... still, even on package catalog connector initial setup and import, i.e. from

https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl

when importing the wheel file containing the airflow 2.x core operators, import is incomplete. For example, the BashOperator and Email Operator are missing. Looked like the detection logic needed fixes, something that @ianonavy fixed in a fork.

the message is as follows in Elyra container:

I 2024-01-05 10:52:38.508 ElyraApp] Analysis of 'https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl' completed. Located 9 operator classes in 4 Python scripts.
[W 2024-01-05 10:52:38.521 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...

package catalog connector code needs fixes, this PR is based on work in a fork by @ianonavy for package catalog connector, so that bash operator and in general the provided base airflow operators work.

Concept behind the airflow package catalog connector in elyra main source: https://github.com/elyra-ai/elyra/tree/main/elyra/pipeline/airflow/package_catalog_connector

work done in fork outside community elyra but never tested and discussed for far here

I am including this change here so it makes it into community Elyra.

Cause as it is now, only an incomplete subset of operators is made available by the airflow package catalog connector with Airflow 2.x wheel file.

Bildschirmfoto 2024-01-12 um 17 48 41

The fix in this PR commit, BaseOperator reference location being compatible with Airflow 2.x

https://airflow.apache.org/docs/apache-airflow/2.6.2/_api/airflow/models/baseoperator/index.html#

It's been in this new location ranging all the way back to 2.0.0.

https://airflow.apache.org/docs/apache-airflow/2.0.0/_api/airflow/models/baseoperator/index.html

change in this PR leads to all operators being available finally in Elyra pipeline editor:

https://airflow.apache.org/docs/apache-airflow/2.6.2/operators-and-hooks-ref.html

after I do the changes, I get a different log on Elyra start when evaluating the wheel file, looking much better, more operator classes (16 instead of 9) detected.

[I 2024-01-12 22:25:00.524 ElyraApp] Analysis of ''https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl'' completed. Located 16 operator classes in 11 Python scripts.
[W 2024-01-12 22:25:00.568 ServerApp] Operator 'BaseBranchOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/branch.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.571 ServerApp] Operator 'EmptyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/empty.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.587 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.592 ServerApp] Operator 'LatestOnlyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/latest_only.py'}' does not have an __init__ function. Skipping...

The 3 operators now not imported and usable are ones that are pipeline-related, but not via the mechanisms offered by KubernetesPodOperator and IBM pipelines, so they and should be skipped from a use case perspective, and also because they init themselves not on their own. Besides that, the EmptyOperator is kind of a placeholder dummy operator anyways, no functionality.

let's check the GUI after the wheel file import and the change in this PR

I can now for example see and use the BashOperator, with Airflow 2.x.

Bildschirmfoto 2024-01-12 um 23 31 57

What changes were proposed in this pull request?

no code refactoring, just a small operator detection logic change for Airflow 2.0.0 and higher

It's been in this new location ranging all the way back to 2.0.0.

https://airflow.apache.org/docs/apache-airflow/2.0.0/_api/airflow/models/baseoperator/index.html

Add package connector support for Airflow 2.x The check for subclasses of BaseOperator uses an outdated package name. This commit and PR adds the new one from Airflow 2.

How was this pull request tested?

no changes in any existing Elyra unit tests.
However, flow of importing Airflow 2.x core operators explained above, with result before and after the change in this PR.
Import worked fine, operators visible in left palette of Elyra pipeline editor.
Tested with: Airflow 2.6.2

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

…s of BaseOperator uses an outdated package name. This commit adds the new one from Airflow 2.

Signed-off-by: Sven Thoms <21118431+shalberd@users.noreply.github.com>
@shalberd shalberd changed the title Airflow package catalog connector for Airflow 2.x wheel, make import of all core operators possible [WIP] Airflow package catalog connector for Airflow 2.x wheel, make import of all core operators possible, parsing changes Jan 17, 2024
@shalberd shalberd added the status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. label Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise.
Projects
None yet
1 participant