-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Issues using SMOTE #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What exactly are you imputing? Do you mind sharing the shape of the dataset and distribution of each class? The error message seems pretty clear, and indicates you don't have enough samples in your minority class (I'm guessing you are using the |
I'm trying to apply the SMOTE Algorithm on my dataset which consists of 93 minority and 250 majority class points. Are there any constraints or pre-conditions for using your library ? |
same issue here |
@Ayyappatheegala Sorry, I didn't see your message until today. No constraints, at least that I know about. I don't think it will work with sparse data, and to be honest, given the dimensionality of your dataset, KNN (hence SMOTE) is likely to fail. Regarding the error you are getting, it is hard for me to know what is happening from the ValueError alone. Perhaps you could share the full error message? The ValueError being raised is likely coming from scikit-learn, it might be due to the object misinterpreting the input data, or maybe the KNN object failing due to the extreme nature of the problem. |
Hi Fernando, Its again due to the format of input supplied. @diego898: Pls refer to issue to issue #31 |
ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 I got thensame error though my input format was correct (numpy arrays). The error pops up because one of the classes does not have enough samples. Here, one of the classes have just one sample and hence the error. I resolved this by taking more samples in my dataset. Also, I was trying to solve a multilabel problem (more than two classes). So, I used SMOTE c-1 times (where c is the number of classes). |
Hi, I have a similar problem. and i will need your suggestions to help resolve it. First i am trying to solve a multiclass problem but due to imbalance i want to use SMOTE . Find below instances of each class Class A 9 Error message is |
Internally you cannot make a KNN with 6 samples with a class containing 2 samples. |
So the point is that you need to have at least 6 samples in each class? I have the same error. |
Thanks Everybody. |
Yes. You can reduce the number of neighbours used. 6 is the default defined in the original paper. |
I have the x_val as (28,100) and y_val as (28,1) where 28 is the number of records and 100 are the features and 1 is the regression output for each record. But I still get the default error "ValueError: Expected n_neighbors <= n_samples, but n_samples = 3, n_neighbors = 6" |
@barthwalsamarth apparently you have 3 samples in one of the class but ask for 6 neighbours |
@glemaitre but I don't have classes here. this is a regression data, there are 100 features all numbers and there is one regression output. That makes me doubtful if SMOTE can be used with regression |
Actually you are right. SMOTE is not designed for regression problem but classification. Whatever methods in this package are for classification. We could extend it for regression but we would need to find the right API. The literature is also more shallow for regression. |
Hello all, I have read the full thread: I have converted my data into numpy array and used k_neighbors=1, but the algorithm still raises the following error: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2 It looks like it does not care about how many neighbors I decide to use to construct my synthetic samples. |
should be clear enough :) But be careful, because having |
I used k_neighbors=1. Why do the algorithms says: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2? |
This is not about the setup I am using (which is for testing the library), rather about the effective way the algorithm works. |
This error is raised by the nearest-neighbours of scikit-learn where the parameter is then called n_neighbors and not k_neighbors or m_neighbors which are specific to SMOTE. We could actually cash the error and raise a more appropriate error with the specific naming. PR welcomed. By recalling the documentation:
Therefore, if the nearest-neighbors is given a single sample at fit and has to use the 5 nearest-neighbors (which are not there since we have a single sample), it leads to the given error. |
Actually this is important because you need to fulfill some minimum assumptions required by the algorithm. |
I think your answer is not correct: m_neighbors : int int or object, optional (default=10) Secondly, I can definitely tell you that even if I set k_neighbors=1 and m_neighbors=1, the algorithms still raises the same error: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2 I am just trying to understand what it is not working exactly. |
Nop, I did not misunderstood it.
From the top of the head, the NN used in scikit-learn returns itself as nearest neighbors which is something which we are not interested and therefore the number of neighbors is augmented of one. But still, I really think that having a single data point is really a corner case. |
Ok, I think that this is the actual answer. So, the neighbors' value is augmented by one, and for this reason, it raises that error. Otherwise, I did not understand why the algorithm was complaining about n_neighbors=2, if I did not specify that anywhere. Thanks! |
Are you sure that you need to apply it c times? I just passed a multi label Maybe they just updated the package to support multi class? |
This issue was closed and the comment is outdated. The documentation of SMOTE mentioned that multiclass is supported in a one-vs-rest manner (automatically). |
Can i use smote for multiclass classification problem ? |
yes |
Is there any intention to make this SMOTE package capable of working for regression in the near future? It seems to have already been implemented in R. |
Not for the moment. |
Uh oh!
There was an error while loading. Please reload this page.
Hi
First of all thank you for providing us with the nice library
I have a imbalanced dataset and I've loaded the dataset using pandas.
When I'm supplying the dataset as input to the SMOTE I'm getting the following error:
Thanks in Advance
The text was updated successfully, but these errors were encountered: