You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Distribution column of Feature Statistics graphically displays a histogram showing the distribution of feature's values with values of numeric features split into bins. The bars can be colored according to different values of another value, but which color corresponds to which value of that variable is not shown. For instance, in these feature statistics on a dataset about 3D printers, it is unclear what the blue and red color stand for, unless a Color widget is added before of after Feature Statistics (which requires to have another window open to see just the legend):
The distribution histograms are nice, for an impression at a first glance, but the information is for visual inspection only. To get the numbers, Distribution has to be used (admittedly with more options to control the histograms)
Since the histogram bar values are calculated anyway, and since Feature Statistics allows selection of one (or more) features, I guess it wouldn't be too difficult to add a "Selected Feature Statistics" output port that gives the same output as Distributions would give for that feature, a split by the same other feature, and the same bin width (which, for numerical values seems to be fixed so that there are 10 bins in Feature Statistics).
What's your proposed solution?
Provide a legend for the colors, for instance to the right of the Color drop-down, or on mouse-hover over the unsplit bbars of the feature selected for coloring. This would really improve the usability of Feature Statistics.
This would be more of a nice-to-have, as it can also be achieved by Distributions. It would require limiting the selection to one feature and replacing the Reduced Data output, of which the functionality can also be achieved by Select Columns.
Are there any alternative solutions?
As indicated, by using Color and Distributions, respectively
The text was updated successfully, but these errors were encountered:
Adding the legend should be simple enough, so we'll do it.
As for the output, you're right: it's just a nice-to-have feature. It would also be problematic if there are multiple selected rows, with possibly different number of bins. So let's leave this one to Distributions and Discretization widgets.
What's your use case?
Since the histogram bar values are calculated anyway, and since Feature Statistics allows selection of one (or more) features, I guess it wouldn't be too difficult to add a "Selected Feature Statistics" output port that gives the same output as Distributions would give for that feature, a split by the same other feature, and the same bin width (which, for numerical values seems to be fixed so that there are 10 bins in Feature Statistics).
What's your proposed solution?
Are there any alternative solutions?
As indicated, by using Color and Distributions, respectively
The text was updated successfully, but these errors were encountered: