Ad
Pd.DataFrame.agg(np.var) Vs Pd.Series.np.var
Using np.var() in two ways to the same dataset but they are giving two different results. Do not think it's because of n & n-1 issue since it's the same numpy function to the same dataset (a Pandas Series - SAT Math Scores).
These are the two ways:
- Directly onto a Series
- Using it with a filtered DataFrame + pd.df.agg() method
However, they are giving two different results. I have read elsewhere that this could be because of the way it's being calculated i.e n
vs n-1
.
Hope for some confirmation/clarification. I am puzzled as I am using the same function np.var() for both occasions:
np.var(sat_2017.Math), np.std(sat_2017.Math)
sat_2017.iloc[:,3].agg([np.var, np.std])
Output:
- Variance: 7068.194540561321
- Std.Deviation: 84.07255521608297
- Variance: 7209.558431
- Std.Deviation: 84.909119
Ad
Answer
Based on the source code, this seems like a bug.
When pd.Series.agg
gets a function object, it looks it up in its predefined list of cython functions:
# pandas.core.base line:555
f = self._is_cython_func(arg)
# pandas.core.base line:639
def _is_cython_func(self, arg):
""" if we define an internal function for this argument, return it """
return self._cython_table.get(arg)
which contains:
pd.Series._cython_table
OrderedDict([(<function sum(iterable, start=0, /)>, 'sum'),
...
(<function numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)>,'var'),
which returns:
f == self._is_cython_func(arg) == 'var'
This then gets used at getattr
:
# pandas.core.base line 556
if f and not args and not kwargs:
return getattr(self, f)(), None
whic returns:
getattr(pd.Series, 'var')
<function pandas.core.series.Series.var(self, axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)>
And there is the culprit! ddof
is now 1.
Ad
source: stackoverflow.com
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module
Ad