Skip to content

Possible spurious SettingWithCopyWarning #6757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Apr 1, 2014 · 26 comments
Closed

Possible spurious SettingWithCopyWarning #6757

jorisvandenbossche opened this issue Apr 1, 2014 · 26 comments

Comments

@jorisvandenbossche
Copy link
Member

Found some SettingWithCopyWarning warnings in older code with newer pandas. But it's quite possible that these are just side-effects of the warning ('False positives') and it cannot easily be detected, but I am no expert in that field.

Simplified dummy example:

In [1]: df = pd.DataFrame(np.arange(20).reshape(5, 4), columns=list('ABCD'))
In [2]: df
Out[2]:
    A   B   C   D
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
4  16  17  18  19

# reassign a selection to df
In [3]: df = df[(df['A']%5)!=0]

Setting a value with .loc[] (this output is from master, in 0.13.1 I got even the advice to Try using .loc[row_index,col_indexer] = value instead while I was using it ... in master just the warning):

In [4]: df.loc[df['B']==17, 'C'] = 1000
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: SettingWithCopyWarning: A va
lue is trying to be set on a copy of a slice from a DataFrame
  #!C:\Anaconda\envs\devel\python.exe

And then replacing a value with replace (also here the advice to use .loc is a little bit strange I think):

In [5]: df['D'] = df['D'].replace({7:2000})
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: SettingWithCopyWarning: A va
lue is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  #!C:\Anaconda\envs\devel\python.exe

In [6]: df
Out[6]:
    A   B     C     D
1   4   5     6  2000
2   8   9    10    11
3  12  13    14    15
4  16  17  1000    19
@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

master looks ok on these (no warnings) .......

IIRC 0.13.1 did have some false positives

I don't get on 0.13.1

I am on numpy 1.7.1....that could be a factor....

@jorisvandenbossche
Copy link
Member Author

Strange, I get these on

In [7]: pd.__version__
Out[7]: '0.13.1-496-gd0aebea'

In [8]: np.__version__
Out[8]: '1.7.1'

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

In [1]: pd.__version__
Out[1]: '0.13.1-533-g2ada054'

In [2]: np.__version__
Out[2]: '1.7.1'

hmm...do you get them ALWAYS, even on a fresh ipython?

@jorisvandenbossche
Copy link
Member Author

Strange, first it seemed that I couldn't replicate this with a fresh ipython, but now it seems that I only see it if I first print the frame:

If I run this on a fresh ipython, I get the warnings:

df = pd.DataFrame(np.arange(40).reshape(10, 4), columns=list('ABCD'))
df
df = df[df['A']>0]
df.loc[df['B']==17, 'C'] = 1000
df['D'] = df['D'].replace({7:2000})

and with this not:

df = pd.DataFrame(np.arange(40).reshape(10, 4), columns=list('ABCD'))
df = df[df['A']>0]
df.loc[df['B']==17, 'C'] = 1000
df['D'] = df['D'].replace({7:2000})

Although with the original code, this was certainly not the case (that it was first printed).

And above was now with latest master ('0.13.1-543-g4bd1e6a')

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

hmm...ok I can replicate that

It is legit because of : df = df[df['A']>0]

which is a sliced version of the original
then when you set with loc you could misinterpret this as setting the original frame (that's the intent of the warning).

The problem is that you are reassigning df to a version of itself; if you use a different variable name then you don't get this. This might be a case where this is not detectable. Let me see.

In [10]: df = pd.DataFrame(np.arange(40).reshape(10, 4), columns=list('ABCD'))

In [11]: x = df[df['A']>0]

In [12]: df.is_copy

In [13]: x.is_copy
Out[13]: <weakref at 0x4746d08; to 'DataFrame' at 0x4759610>

In [14]: x.loc[df['B']==17, 'C'] = 1000

In [15]: x['D'] = df['D'].replace({7:2000})

Sorry, something went wrong.

@jorisvandenbossche
Copy link
Member Author

I also get the warning when assigning to a different variable (and then also without printing the frame first!):

In [1]: df = pd.DataFrame(np.arange(40).reshape(10, 4), columns=list('ABCD'))

In [2]: x = df[df['A']>0]

In [3]: x.loc[df['B']==17, 'C'] = 1000
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: SettingWithCopyWarning: A va
lue is trying to be set on a copy of a slice from a DataFrame
  #!C:\Anaconda\envs\devel\python.exe

But I suppose here the warning is legitimate, as it is indeed not changed in the original.

Sorry, something went wrong.

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

that is legit

@jorisvandenbossche
Copy link
Member Author

ah, yes, was just updating my comment saying that.

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

of course with the caveat that its not wrong per se, just a warning (mainly for new users).

@bluefir
Copy link

bluefir commented Apr 2, 2014

I get this when I do inplace sort_index and fillna:

portfolio_analytics\attribution\Hierarchies.py:212: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
data_frame.sort_index(inplace=True)

C:\Python27\lib\site-packages\pandas\core\generic.py:2174: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
obj.fillna(v, inplace=True)

@bluefir
Copy link

bluefir commented Apr 2, 2014

Another one:

C:\Python27\lib\site-packages\pandas\core\indexing.py:346: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
self.obj[item] = s

@bluefir
Copy link

bluefir commented Apr 2, 2014

Wow! This thing is everywhere:

# Recalculate portfolio betas
portfolio_data_returns[field_beta_by_portfolio_weight] = portfolio_data_returns[field_beta] * portfolio_data_returns[field_portfolio_weight]
portfolio_betas = portfolio_data_returns[field_beta_by_portfolio_weight].groupby(level=field_date).sum()

-c:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead

@jreback
Copy link
Contributor

jreback commented Apr 2, 2014

@bluefir
Copy link

bluefir commented Apr 2, 2014

Ok. In my last example, what am I doing inplace?

@jreback
Copy link
Contributor

jreback commented Apr 2, 2014

it prob was already not a copy
the assignment does the check

you need to look at the first time it happens
and can always check is_copy property

@bluefir
Copy link

bluefir commented Apr 2, 2014

It's a copy, yes. I wanted it explicitly just to make sure the original DataFrame is left intact.

portfolio_data_returns = portfolio_data_all.loc[return_date_first:return_date_last].copy()

@jreback
Copy link
Contributor

jreback commented Apr 2, 2014

and is that a series?

@bluefir
Copy link

bluefir commented Apr 2, 2014

DataFrame

@jreback
Copy link
Contributor

jreback commented Apr 2, 2014

you need to use loc just as it says then

@fonnesbeck
Copy link

I'm getting similar warnings for an operation like this, where I am just trying to truncate a variable at a particular value:

lab_subset.YEAR_AGE[lab_subset.YEAR_AGE > 75] = 75

So, I go in and try to use .loc:

lab_subset.YEAR_AGE.loc[lab_subset.YEAR_AGE > 75] = 75

But get the same error. The object lab_subset is also the result of an indexing operation:

lab_subset = measles_data[(CONFIRMED | DISCARDED) & measles_data.YEAR_AGE.notnull() & measles_data.COUNTY.notnull()]

So, I tried to use .loc on that as well, but the warning persists.

Its not clear to me what is going on here. Running '0.14.1-486-g1d65bc8' on Python 2.7.6 and OS X 10.9.5.

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

no, you are still chaining, you need to use .loc on the DataFrame

lab_subset.loc[lab_subset.YEAR_AGE > 75,'YEAR_AGE'] = 75

@fonnesbeck
Copy link

OK, I see now. Man, that's going to be a tough one for new users to digest.

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

that's why its a warning! That's the reason for it though. It sometimes does work. (and has been around since 0.13.0), just getting better / less spurious though the versions.

@fonnesbeck
Copy link

So, I still get the warning despite the syntax change. lab_subset isn't a view on measles_data, is it?

loc

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

you should put .copy() at the end of the first expression. Otherwise you end up changing data in the measles_dataset! (if its a view, which if its a single dtype ti will be), otherwise it would make a copy.

But that's the rub, you don't want to have to care/know its a view (and if its a view, you certainly don't want to propogate back to the original, except explicity).

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

closing as stale. pls reopen if still an issue.

@jreback jreback closed this as completed Oct 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants