Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Statistics: legend for "Color", and "Selected Feature Statistics" output with richer data #6730

Open
wvdvegte opened this issue Feb 13, 2024 · 1 comment

Comments

@wvdvegte
Copy link

What's your use case?

  1. The Distribution column of Feature Statistics graphically displays a histogram showing the distribution of feature's values with values of numeric features split into bins. The bars can be colored according to different values of another value, but which color corresponds to which value of that variable is not shown. For instance, in these feature statistics on a dataset about 3D printers, it is unclear what the blue and red color stand for, unless a Color widget is added before of after Feature Statistics (which requires to have another window open to see just the legend):
image
  1. The distribution histograms are nice, for an impression at a first glance, but the information is for visual inspection only. To get the numbers, Distribution has to be used (admittedly with more options to control the histograms)
    Since the histogram bar values are calculated anyway, and since Feature Statistics allows selection of one (or more) features, I guess it wouldn't be too difficult to add a "Selected Feature Statistics" output port that gives the same output as Distributions would give for that feature, a split by the same other feature, and the same bin width (which, for numerical values seems to be fixed so that there are 10 bins in Feature Statistics).

What's your proposed solution?

  1. Provide a legend for the colors, for instance to the right of the Color drop-down, or on mouse-hover over the unsplit bbars of the feature selected for coloring. This would really improve the usability of Feature Statistics.
  2. This would be more of a nice-to-have, as it can also be achieved by Distributions. It would require limiting the selection to one feature and replacing the Reduced Data output, of which the functionality can also be achieved by Select Columns.

Are there any alternative solutions?
As indicated, by using Color and Distributions, respectively

@janezd janezd self-assigned this Feb 16, 2024
@janezd
Copy link
Contributor

janezd commented Feb 16, 2024

Adding the legend should be simple enough, so we'll do it.

As for the output, you're right: it's just a nice-to-have feature. It would also be problematic if there are multiple selected rows, with possibly different number of bins. So let's leave this one to Distributions and Discretization widgets.

@janezd janezd removed their assignment Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants