Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query interface examples #54

Closed
schmitts opened this issue Feb 21, 2020 · 5 comments
Closed

Query interface examples #54

schmitts opened this issue Feb 21, 2020 · 5 comments
Labels

Comments

@schmitts
Copy link

Could you please add examples on how to query/select? E.g. how does a pandas query like below look in DataFrame?

df_filtered = df[df.something > 5]
@hosseinmoein
Copy link
Owner

hosseinmoein commented Feb 21, 2020

The equivalent to that, in DataFrame, is either of the following methods:

get_data_by_sel()
get_view_by_sel()

First one returns another DataFrame. Second one returns a view

The documentation for them is here:
https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/get_data_by_sel.html

There are examples in the tester file here:
https://github.com/hosseinmoein/DataFrame/blob/master/test/dataframe_tester_2.cc#L4396
and here:
https://github.com/hosseinmoein/DataFrame/blob/master/test/dataframe_tester_2.cc#L4621

There are similar examples to your questions in the tester file

@yegorrr
Copy link

yegorrr commented Jul 28, 2022

how can i do something like
df["price_usd"]=df["price"]*df["fx_rate"]
or
df["x"]+=constant
for each timestamp(including missing data), ie something like a visitor but for rows instead of columns

@hosseinmoein
Copy link
Owner

There are a couple of different ways:
Look at consolidate()
You can write your own visitor which is simple
Or you can manually do it with a loop

@yegorrr
Copy link

yegorrr commented Jul 29, 2022

thanks, consolidate is probably what i was looking for.

another question, assume i'm adding last trade pices of some asset in real time and i'm only interested in its current price and moving average over x seconds. what would be the most efficient way:

  1. store all trades and timestamps as they arrive in data frame and on each insert call get_data_by_sel() filtering all trades over last x seconds and averaging that data
  2. create a separate structure, kind of a queue that stores only data over the last x seconds - each insert may pop an arbitrary amount of elements from the queue - from 0 to N. assuming all data is sorted by timestamps, this would be as efficient as 1)
  3. something else?

@hosseinmoein
Copy link
Owner

If you don’t need all the data, why store it? In that case something like 2 is better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants