Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(docs): create architecture page #28481

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

sfirke
Copy link
Contributor

@sfirke sfirke commented May 13, 2024

SUMMARY

  • Adds an "architecture" page summarizing the components of a Superset installation. A visual diagram should eventually go here too. Please feel free to edit/add/cut.
  • Reorders other installation pages to follow this one
  • Copyediting of Configuring Superset page

@github-actions github-actions bot added the doc Namespace | Anything related to documentation label May 13, 2024
@sfirke
Copy link
Contributor Author

sfirke commented May 14, 2024

This is ready for review.

## Components

A Superset installation is made up of these components:
1. The Superset application itself
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it "the Superset [python] backend"? "Centered around a web server[flask] and async workers[celery]". Architecturally we may want to break it down:

  • The Superset [python] backend
    • A web application [flask] serving assets and the Superset API
    • Async [celery] workers, taking on longer/heavier tasks beyond the scope of a web request and on a schedule

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call out "The Superset frontend" - a collection of React application bundled with webpack and using antd, ...

A Superset installation is made up of these components:
1. The Superset application itself
2. A metadata database to store Superset's data about users, charts, dashboards, etc.
3. A caching layer (optional, but necessary for some features)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called out twice in "components" and "optional component", i think it's a great idea to seperate the section, but let's have each component on just one side

@mistercrunch
Copy link
Member

mistercrunch commented May 15, 2024

hey! - I'm hoping it doesn't come across as a hostile takeover of this PR, but I threw your pages into GPT and added some comments/input in the form of bullet, and this is what came out:


Architecture

This documentation outlines the architecture of Apache Superset, with a primary focus on the backend components before introducing the frontend. This organization provides a thorough understanding of how Superset operates from data handling to user interface.

Superset Backend

The backend of Superset is composed of several critical components designed to manage data, execute tasks, and maintain the overall functionality of the system.

Core Backend Components

Web Application [Python/Flask]:

  • Description: Serves static assets and handles API requests from the Superset frontend.
  • Role: Acts as the primary communication hub for frontend interactions and immediate query processing.

Metadata Database [Required]:

  • Description: Stores all of Superset’s essential assets such as dashboards, charts, user configurations, and logs.
  • Supported Technologies: PostgreSQL or MySQL (recommended), other SQLAlchemy-supported OLTP databases

Optional Backend Components

Asynchronous Workers [Pytjhon/Celery]:

  • Description: Manages tasks that are too long or intense for a typical web request cycle.
  • Enabled Features:
    • Asynchronous query executions in SQL Lab.
    • Scheduled generation of alerts and reports.
    • Creation of dashboard thumbnails.
  • Dependencies: Requires a message queue (e.g., Redis, RabbitMQ).

Caching Layer:

  • Description: Enhances performance by caching query results and frequently accessed data.
  • Technologies: Primarily Redis, with support for other caching systems.
  • Enabled Features:
    • Accelerated access to repeated queries.
    • Improved responsiveness of the application.

Logging Interfaces

  • Standard Output and Error Logs: Essential for debugging and monitoring application health.
  • StatsD/Metrics Collection: Enables real-time aggregation of performance metrics.
  • Analytics Logging: Rich structured logs that provides insights into user behaviors and application usage patterns. Typically sent to a stream to land into a data warehouse

Other Common Infrastructure components:

  • Load Balancers/API Gateway: Distributes incoming traffic across multiple servers to enhance availability and manage traffic peaks.
  • Observability/Alerting: Provides monitoring, error tracking, and real-time alerts to maintain performance and uptime.
  • WSGI Server (e.g., Gunicorn in async mode):
  • Database Drivers: Enables communication between Superset and its databases, crucial for operational data querying and management.
  • Orchestration (e.g., Kubernetes): Automates deployment, scaling, and management of containerized applications, ensuring robust service availability.
  • Additional Security Measures: Implements network security, data encryption, and access controls to safeguard data and comply with regulations.

Superset Frontend

The frontend of Superset is a sophisticated web client built using modern web technologies to facilitate interactive data visualization.

Core Technologies:

  • React: Forms the foundation of the frontend, offering a responsive and dynamic user interface.
  • antd: Utilized for designing the visual components and layout of the UI, providing a consistent and professional aesthetic.
  • Plugin Architecture: Allows for the extension of visualization capabilities through custom plugins, enhancing the flexibility and functionality of visual data representation.

Functionality:

  • Communicates with the backend via the Superset API, enabling users to manage and visualize data efficiently.
  • Supports extensive customization and extension through community-developed plugins and themes.

@rusackas
Copy link
Member

Github needs the "face melt" emoji as a reaction.

Is it possible to "land and expand" here? My trust issues with GPT aside (did it really say "Pytjhon?") I wonder if we can merge a first iteration, then divide/expand/remove/elaborate as needed from there, rather than go deep here. We can feed it to GPT as we go for consolidation/clarification/organization.

@mistercrunch
Copy link
Member

I did a fair amount of prompt-inputs and edits to get to that, but main thing is the structure of the docs (backend/frontend) and mentioning technologies used (and technology choices) in different areas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Namespace | Anything related to documentation size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants