Issues using SMOTE #27

Ayyappatheegala · 2016-01-14T09:02:50Z

Hi
First of all thank you for providing us with the nice library

I have a imbalanced dataset and I've loaded the dataset using pandas.
When I'm supplying the dataset as input to the SMOTE I'm getting the following error:

ValueError: Expected n_neighbors <= n_samples,  but n_samples = 1, n_neighbors = 6

Thanks in Advance

fmfn · 2016-01-16T03:07:17Z

What exactly are you imputing? Do you mind sharing the shape of the dataset and distribution of each class?

The error message seems pretty clear, and indicates you don't have enough samples in your minority class (I'm guessing you are using the regular variation of SMOTE).

Ayyappatheegala · 2016-01-16T04:15:04Z

I'm trying to apply the SMOTE Algorithm on my dataset which consists of 93 minority and 250 majority class points.
The dimension of each vector is 67730
i.e the shape of the dataset is (96 * 67730) and (250 * 67730)

Are there any constraints or pre-conditions for using your library ?

diego898 · 2016-02-16T19:33:38Z

same issue here

fmfn · 2016-02-16T19:54:05Z

@Ayyappatheegala Sorry, I didn't see your message until today.

No constraints, at least that I know about. I don't think it will work with sparse data, and to be honest, given the dimensionality of your dataset, KNN (hence SMOTE) is likely to fail.

Regarding the error you are getting, it is hard for me to know what is happening from the ValueError alone. Perhaps you could share the full error message? The ValueError being raised is likely coming from scikit-learn, it might be due to the object misinterpreting the input data, or maybe the KNN object failing due to the extreme nature of the problem.

Ayyappatheegala · 2016-02-17T03:57:10Z

Hi Fernando,
The error has been solved for me.

Its again due to the format of input supplied.
As your package assumes all inputs to be numpy arrays.

@diego898: Pls refer to issue to issue #31

Pulkit-Khandelwal · 2016-12-17T23:07:24Z

ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6

I got thensame error though my input format was correct (numpy arrays). The error pops up because one of the classes does not have enough samples. Here, one of the classes have just one sample and hence the error. I resolved this by taking more samples in my dataset.

Also, I was trying to solve a multilabel problem (more than two classes). So, I used SMOTE c-1 times (where c is the number of classes).

@Ayyappatheegala @fmfn @diego898

vappiah · 2018-02-14T10:51:00Z

Hi, I have a similar problem. and i will need your suggestions to help resolve it. First i am trying to solve a multiclass problem but due to imbalance i want to use SMOTE . Find below instances of each class

Class A 9
Class B 644
Class C 2
Class D 289

Error message is
ValueError: Expected n_neighbors <= n_samples, but n_samples = 2, n_neighbors = 6

glemaitre · 2018-02-14T12:17:28Z

Internally you cannot make a KNN with 6 samples with a class containing 2 samples.

abautistah · 2018-02-26T20:09:19Z

So the point is that you need to have at least 6 samples in each class? I have the same error.

vappiah · 2018-02-26T22:36:43Z

Thanks Everybody.

glemaitre · 2018-02-27T12:32:38Z

So the point is that you need to have at least 6 samples in each class? I have the same error.

Yes. You can reduce the number of neighbours used. 6 is the default defined in the original paper.

barthwalsamarth · 2018-04-24T14:28:44Z

I have the x_val as (28,100) and y_val as (28,1) where 28 is the number of records and 100 are the features and 1 is the regression output for each record. But I still get the default error "ValueError: Expected n_neighbors <= n_samples, but n_samples = 3, n_neighbors = 6"
What am I doing wrong?? And also is SMOTE also for regression or only classification? Below is what I am doing.
sm = SMOTE(ratio=0.5,random_state=10) features_res, labels_res = sm.fit_sample(x_val, y_val)

glemaitre · 2018-04-24T21:33:28Z

@barthwalsamarth apparently you have 3 samples in one of the class but ask for 6 neighbours

barthwalsamarth · 2018-04-25T08:34:32Z

@glemaitre but I don't have classes here. this is a regression data, there are 100 features all numbers and there is one regression output. That makes me doubtful if SMOTE can be used with regression

glemaitre · 2018-04-25T09:44:53Z

That makes me doubtful if SMOTE can be used with regression

Actually you are right. SMOTE is not designed for regression problem but classification. Whatever methods in this package are for classification. We could extend it for regression but we would need to find the right API. The literature is also more shallow for regression.

ghiander · 2018-06-26T09:47:36Z

Hello all,

I have read the full thread: I have converted my data into numpy array and used k_neighbors=1, but the algorithm still raises the following error:

Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2

It looks like it does not care about how many neighbors I decide to use to construct my synthetic samples.

glemaitre · 2018-06-27T11:47:46Z

Expected n_neighbors <= n_sample

should be clear enough :)

But be careful, because having n_samples=1 is really a corner case for which I am not sure the algorithm will give anything useful.

ghiander · 2018-06-27T11:51:43Z

I used k_neighbors=1. Why do the algorithms says: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2?

ghiander · 2018-06-27T11:52:40Z

This is not about the setup I am using (which is for testing the library), rather about the effective way the algorithm works.

glemaitre · 2018-06-27T12:13:17Z

I used k_neighbors=1. Why do the algorithms says: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2?

This error is raised by the nearest-neighbours of scikit-learn where the parameter is then called n_neighbors and not k_neighbors or m_neighbors which are specific to SMOTE.

We could actually cash the error and raise a more appropriate error with the specific naming. PR welcomed.

By recalling the documentation:

k_neighbors is number of nearest neighbours to used to construct synthetic samples.
n_neighbors is the number of nearest neighbours to use to determine if a minority sample is in danger.

Therefore, if the nearest-neighbors is given a single sample at fit and has to use the 5 nearest-neighbors (which are not there since we have a single sample), it leads to the given error.

glemaitre · 2018-06-27T12:15:13Z

This is not about the setup I am using (which is for testing the library)

Actually this is important because you need to fulfill some minimum assumptions required by the algorithm.

ghiander · 2018-06-27T13:19:30Z

I think your answer is not correct:

m_neighbors : int int or object, optional (default=10)
If int, number of nearest neighbors to use to determine if a minority sample is in danger.
The algorithm raises an error by the calling the parameter n_neighbors as you wrote. You recalled the documentation but you misunderstood m_neighbors with n_neighbors.

Secondly, I can definitely tell you that even if I set k_neighbors=1 and m_neighbors=1, the algorithms still raises the same error: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2

I am just trying to understand what it is not working exactly.

glemaitre · 2018-06-27T13:45:45Z

The algorithm raises an error by the calling the parameter n_neighbors as you wrote. You recalled the documentation but you misunderstood m_neighbors with n_neighbors.

Nop, I did not misunderstood it. n_neighors refers to the number of neighbors within scikit-learn NN algorithm which can be either m_neighbors or k_neighbors in SMOTE. If you give the trace back I can tell you exactly which one of the NN is actually failing.

Secondly, I can definitely tell you that even if I set k_neighbors=1 and m_neighbors=1, the algorithms still raises the same error: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2

From the top of the head, the NN used in scikit-learn returns itself as nearest neighbors which is something which we are not interested and therefore the number of neighbors is augmented of one.

But still, I really think that having a single data point is really a corner case.

ghiander · 2018-06-27T13:52:23Z

From the top of the head, the NN used in scikit-learn returns itself as nearest neighbors which is something which we are not interested and therefore the number of neighbors is augmented of one.

Ok, I think that this is the actual answer. So, the neighbors' value is augmented by one, and for this reason, it raises that error. Otherwise, I did not understand why the algorithm was complaining about n_neighbors=2, if I did not specify that anywhere.

Thanks!

SqrtPapere · 2018-12-03T12:08:57Z

ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6
Also, I was trying to solve a multilabel problem (more than two classes). So, I used SMOTE c-1 times (where c is the number of classes).

@Ayyappatheegala @fmfn @diego898

Are you sure that you need to apply it c times? I just passed a multi label X and Y in input and the output was new_X, new_Y with all the classes with same number of occurrences!

Maybe they just updated the package to support multi class?

glemaitre · 2018-12-03T12:38:11Z

This issue was closed and the comment is outdated. The documentation of SMOTE mentioned that multiclass is supported in a one-vs-rest manner (automatically).

pracaas · 2019-04-17T06:17:47Z

Can i use smote for multiclass classification problem ?

glemaitre · 2019-04-17T06:26:28Z

Can i use smote for multiclass classification problem ?

yes

ballcap231 · 2019-10-14T16:32:28Z

That makes me doubtful if SMOTE can be used with regression

Actually you are right. SMOTE is not designed for regression problem but classification. Whatever methods in this package are for classification. We could extend it for regression but we would need to find the right API. The literature is also more shallow for regression.

Is there any intention to make this SMOTE package capable of working for regression in the near future? It seems to have already been implemented in R.
https://rdrr.io/cran/UBL/man/smoteRegress.html

glemaitre · 2019-10-16T15:58:20Z

Not for the moment.

glemaitre closed this as completed Jun 20, 2016

scikit-learn-contrib locked as resolved and limited conversation to collaborators Oct 16, 2019

Issues using SMOTE #27

Issues using SMOTE #27

Comments

Ayyappatheegala commented Jan 14, 2016 • edited by glemaitre Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fmfn commented Jan 16, 2016

Uh oh!

Ayyappatheegala commented Jan 16, 2016

Uh oh!

diego898 commented Feb 16, 2016

Uh oh!

fmfn commented Feb 16, 2016

Uh oh!

Ayyappatheegala commented Feb 17, 2016

Uh oh!

Pulkit-Khandelwal commented Dec 17, 2016

Uh oh!

vappiah commented Feb 14, 2018

Uh oh!

glemaitre commented Feb 14, 2018

Uh oh!

abautistah commented Feb 26, 2018

Uh oh!

vappiah commented Feb 26, 2018

Uh oh!

glemaitre commented Feb 27, 2018

Uh oh!

barthwalsamarth commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Apr 24, 2018

Uh oh!

barthwalsamarth commented Apr 25, 2018

Uh oh!

glemaitre commented Apr 25, 2018

Uh oh!

ghiander commented Jun 26, 2018

Uh oh!

glemaitre commented Jun 27, 2018

Uh oh!

ghiander commented Jun 27, 2018

Uh oh!

ghiander commented Jun 27, 2018

Uh oh!

glemaitre commented Jun 27, 2018

Uh oh!

glemaitre commented Jun 27, 2018

Uh oh!

ghiander commented Jun 27, 2018

Uh oh!

glemaitre commented Jun 27, 2018

Uh oh!

ghiander commented Jun 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SqrtPapere commented Dec 3, 2018

Uh oh!

glemaitre commented Dec 3, 2018

Uh oh!

pracaas commented Apr 17, 2019

Uh oh!

glemaitre commented Apr 17, 2019

Uh oh!

ballcap231 commented Oct 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Oct 16, 2019

Uh oh!

Ayyappatheegala commented Jan 14, 2016 •

edited by glemaitre

Loading

barthwalsamarth commented Apr 24, 2018 •

edited

Loading

ghiander commented Jun 27, 2018 •

edited

Loading

ballcap231 commented Oct 14, 2019 •

edited

Loading