Discussion:
Array out of bounds weka and postgres
Steven Shack
2013-02-12 04:39:33 UTC
Permalink
I've got a naive bays classifier workflow in the knowledge workflow portion of weka.
I can read small training and test sets in from a postgres database fine. Where small means, up to 40K rows for training.
30K rows for testing.

When I tried reading in my large test set (400K rows), I get a java array index out of bounds exception and the workflow
stops. Except, I have no clue what is generating this error within my workflow. Or if it's caused by bad data being
read in from the database (which column, row then?).

Given that there's no indication where the problem lays. Is there any way I can narrow this down a bit? I've
attached my workflow below. My workflow works nicely for smaller datasets.


-- Java error.
Notifying data listeners (ClassAssigner)
Notifying data listeners (ClassAssigner)
In accept data set
Notifying listeners (training set maker)
java.lang.ArrayIndexOutOfBoundsException
Eibe Frank
2013-02-12 21:15:02 UTC
Permalink
I have replaced the database loaders/savers with ARFF loaders/savers and run your flow on the mnist data (60,000 training instances and 10,000 test instances). That worked fine.

Cheers,
Eibe
Post by Steven Shack
I've got a naive bays classifier workflow in the knowledge workflow portion of weka.
I can read small training and test sets in from a postgres database fine. Where small means, up to 40K rows for training.
30K rows for testing.
When I tried reading in my large test set (400K rows), I get a java array index out of bounds exception and the workflow
stops. Except, I have no clue what is generating this error within my workflow. Or if it's caused by bad data being
read in from the database (which column, row then?).
Given that there's no indication where the problem lays. Is there any way I can narrow this down a bit? I've
attached my workflow below. My workflow works nicely for smaller datasets.
-- Java error.
Notifying data listeners (ClassAssigner)
Notifying data listeners (ClassAssigner)
In accept data set
Notifying listeners (training set maker)
java.lang.ArrayIndexOutOfBoundsException
<arrayoutofbounds.kfml>_______________________________________________
Wekalist mailing list
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Steven Shack
2013-02-13 05:36:22 UTC
Permalink
So tracking this down a bit further, it looks like the array out of bounds error might be coming from the classifier.
Would having an unseen nominal value in the test set (remembering I have a much larger test set) cause this behaviour?

This is a frustrating issue to debug. I've got no visibility into where this error is coming from.
Post by Eibe Frank
I have replaced the database loaders/savers with ARFF loaders/savers and run your flow on the mnist data (60,000 training instances and 10,000 test instances). That worked fine.
Cheers,
Eibe
Post by Steven Shack
I've got a naive bays classifier workflow in the knowledge workflow portion of weka.
I can read small training and test sets in from a postgres database fine. Where small means, up to 40K rows for training.
30K rows for testing.
When I tried reading in my large test set (400K rows), I get a java array index out of bounds exception and the workflow
stops. Except, I have no clue what is generating this error within my workflow. Or if it's caused by bad data being
read in from the database (which column, row then?).
Given that there's no indication where the problem lays. Is there any way I can narrow this down a bit? I've
attached my workflow below. My workflow works nicely for smaller datasets.
-- Java error.
Notifying data listeners (ClassAssigner)
Notifying data listeners (ClassAssigner)
In accept data set
Notifying listeners (training set maker)
java.lang.ArrayIndexOutOfBoundsException
<arrayoutofbounds.kfml>_______________________________________________
Wekalist mailing list
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall
2013-02-22 09:36:24 UTC
Permalink
Post by Steven Shack
So tracking this down a bit further, it looks like the array out of
bounds error might be coming from the classifier.
Would having an unseen nominal value in the test set (remembering I have
a much larger test set) cause this behaviour?
Yes. Classifiers expect the test data to have exactly the same structure
as the training data (this includes number and order of nominal values).
Typically no checks are done in order to facilitate fast prediction.

Cheers,
Mark.
Post by Steven Shack
This is a frustrating issue to debug. I've got no visibility into where
this error is coming from.
Post by Eibe Frank
I have replaced the database loaders/savers with ARFF loaders/savers
and run your flow on the mnist data (60,000 training instances and
10,000 test instances). That worked fine.
Cheers,
Eibe
Post by Steven Shack
I've got a naive bays classifier workflow in the knowledge workflow portion of weka.
I can read small training and test sets in from a postgres database
fine. Where small means, up to 40K rows for training.
30K rows for testing.
When I tried reading in my large test set (400K rows), I get a java
array index out of bounds exception and the workflow
stops. Except, I have no clue what is generating this error within my
workflow. Or if it's caused by bad data being
read in from the database (which column, row then?).
Given that there's no indication where the problem lays. Is there any
way I can narrow this down a bit? I've
attached my workflow below. My workflow works nicely for smaller datasets.
-- Java error.
Notifying data listeners (ClassAssigner)
Notifying data listeners (ClassAssigner)
In accept data set
Notifying listeners (training set maker)
java.lang.ArrayIndexOutOfBoundsException
<arrayoutofbounds.kfml>_______________________________________________
Wekalist mailing list
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...