Error with Weka while creating an Instance for classification

Hi

I'm new to weka so forgive me if my questions have obvious answers. I have
looked around for answers to these questions but if i've missed them feel
free to point me in the right direction.
I am using weka to cluster some documents represented as hashmaps in the
code below (The hashmaps is also beeing used as input to another
clustering package). To do this i use parts from
"http://weka.sourceforge.net/wiki/index.php/Creating_an_ARFF_file" to
create an arff file on the fly so to speek. What I would like to know is,
is this the right way to do it and is there a way to name the different
instances so that I can find out which instance is in which cluster?

I've added some code so that you can see what I've done.

thanks in advance
Daniel

Exec ex = new Exec();
Object[] words= ex.docs.getAllWords();

atts = new FastVector();

for(int j=0; j<words.length ;j++){
atts.addElement(new Attribute("att" + j));
}

data = new Instances("MyRelation", atts, 0);

for(int i=0; i<ex.docs.numOfDocuments(); i++){
Document doc = ex.docs.getDocument(i);
vals = new double[data.numAttributes()];
for(int j=0; j<vals.length; j++){
if(doc.hashMapVector.containsKey(words[j])){
vals[j] = doc.hashMapVector.get(words[j].toString());
}
else{
vals[j] = 0;
}
}
data.add(new Instance(1.0, vals));
}

Peter Reutemann

2007-01-16 20:16:18 UTC

Post by Daniel Jansson
I am using weka to cluster some documents represented as hashmaps in the
code below (The hashmaps is also beeing used as input to another
clustering package).

Normally, one would probably use the StringToWordVector filter to turn
the documents into word counts (the filter has a few more options):
weka.filters.unsupervised.attribute.StringToWordVector

Post by Daniel Jansson
To do this i use parts from
"http://weka.sourceforge.net/wiki/index.php/Creating_an_ARFF_file" to
create an arff file on the fly so to speek. What I would like to know is,
is this the right way to do it

I see nothing wrong with that.

Post by Daniel Jansson
and is there a way to name the different
instances so that I can find out which instance is in which cluster?

Not directly, but there are a few ways to do it though:
1. use the AddCluster filter to add a new attribute to the dataset
indicating to which cluster an instance belongs.

weka.filters.unsupervised.attribute.AddCluster

2. Add a unique ID to your instances via the AddID filter and then use
the FilteredClusterer in combination with the Remove filter to
remove the ID attribute again for the clusterer to avoid the
clusterer from getting additional information. In the calling code
you'll be still able to see the IDs and see what cluster they got
added to.

weka.filters.unsupervised.attribute.AddID

weka.clusterers.FilteredClusterer
|
|- weka.filters.unsupervised.attribute.Remove
|
|- <your base clusterer>

HTH

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Daniel Jansson

2007-04-04 08:13:51 UTC

Post by Daniel Jansson
and is there a way to name the different
instances so that I can find out which instance is in which cluster?

1. use the AddCluster filter to add a new attribute to the dataset
indicating to which cluster an instance belongs.
weka.filters.unsupervised.attribute.AddCluster
2. Add a unique ID to your instances via the AddID filter and then use
the FilteredClusterer in combination with the Remove filter to
remove the ID attribute again for the clusterer to avoid the
clusterer from getting additional information. In the calling code
you'll be still able to see the IDs and see what cluster they got
added to.
weka.filters.unsupervised.attribute.AddID
weka.clusterers.FilteredClusterer
|
|- weka.filters.unsupervised.attribute.Remove
|
|- <your base clusterer>

I asked this question a couple of month ago and got this respons. What I
would like to know now is there a tutorial or something for using
AddCluster or AddID, I've found nothing?

Thanks
Daniel

Peter Reutemann

2007-04-04 12:26:45 UTC

Post by Daniel Jansson

Post by Daniel Jansson
and is there a way to name the different
instances so that I can find out which instance is in which cluster?

I asked this question a couple of month ago and got this respons. What I
would like to know now is there a tutorial or something for using
AddCluster or AddID, I've found nothing?

AddCluster and AddID are just like any other filter in Weka. You can check
out the online help available through the "More" button in the GUI when
you're displaying the filter's properties, or read the filter's Javadoc,
or get hold of the Data Mining book by Witten&Frank:
http://www.cs.waikato.ac.nz/~ml/weka/book.html

Have you checked out the Weka primer already (for commandline use of
filters)?
http://weka.sourceforge.net/wekadoc/index.php/en:Primer

Or any documentation on the WekaDoc Wiki corresponding to your Weka version?
http://weka.sourceforge.net/wekadoc/

HTH

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Tatdow Pansombut

2007-04-04 18:20:20 UTC

Hello,
I need to convert csv file to arff so I use the
command:

java -cp
/Users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.core.converters.CSVLoader Input/CPDBAS_v3b.csv >
Input/CPDBAS_v3b.arff

I look at CPDBAS_v3b.arff and it seems to have the
right arff format (it is very large file so I do not
know for sure). But when I try to run the filter with
this command:

java -cp
/users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.filters.unsupervised.attribute.ReplaceMissingValues
-i Input/CPDBAS_v3b.arff -o Input/CPDBAS_v3b_R.arff

I get the following error:

java.io.IOException: nominal value not declared in
header, read Token[N=C], line 1
at weka.core.Instances.errms(Unknown Source)
at weka.core.Instances.getInstanceFull(Unknown
Source)
at weka.core.Instances.getInstance(Unknown
Source)
at weka.core.Instances.readInstance(Unknown
Source)
at
weka.core.converters.ArffLoader.getNextInstance(Unknown
Source)
at
weka.core.converters.ConverterUtils$DataSource.hasMoreElements(Unknown
Source)
at weka.filters.Filter.filterFile(Unknown
Source)
at weka.filters.Filter.runFilter(Unknown
Source)
at
weka.filters.unsupervised.attribute.ReplaceMissingValues.main(Unknown
Source)

I don't think the conversion from csv file to arff is
incorrect, but I cannot figure out what gone wrong.
Please help.

Note that I don't have any problem loading csv file
with the GUI but I need to run weka using command line
instead of GUI b/c I need to be able to use -g option
to see the graph (The Graph Virtualization freeze and
I cannot solve the problem).

Ann

Peter Reutemann

2007-04-04 21:29:06 UTC

Post by Tatdow Pansombut
I need to convert csv file to arff so I use the
java -cp
/Users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.core.converters.CSVLoader Input/CPDBAS_v3b.csv >
Input/CPDBAS_v3b.arff
I look at CPDBAS_v3b.arff and it seems to have the
right arff format (it is very large file so I do not
know for sure). But when I try to run the filter with
java -cp
/users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.filters.unsupervised.attribute.ReplaceMissingValues
-i Input/CPDBAS_v3b.arff -o Input/CPDBAS_v3b_R.arff
java.io.IOException: nominal value not declared in
header, read Token[N=C], line 1
at weka.core.Instances.errms(Unknown Source)
at weka.core.Instances.getInstanceFull(Unknown
Source)
at weka.core.Instances.getInstance(Unknown
Source)

...

The "line 1" error is a bit strange. Try replacing the first line
containing "@relation ..." with "@relation blah" to make sure that you
have a valid relationname and try running the filter again:

cat Input/CPDBAS_v3b.arff | \
sed s/"@relation .*"/"@relation blah"/g > test.arff

You could also try extracting the first couple of lines of the ARFF file
and check whether you can load that correctly (and whether is a correct
ARFF file). Just create a new file ("header.sh"), paste the following
lines in there:

#!/bin/bash
#
# this will output the header of an ARFF file and
# the first two lines of data
#
LINE=`grep "^\@data" $1 -n | cut -f1 -d:`
head -n $((LINE + 2)) $1

Make it executable ("chmod a+x header.sh") and run it with the ARFF file
as parameter:

header.sh Input/CPDBAS_v3b.arff

Post by Tatdow Pansombut
Note that I don't have any problem loading csv file
with the GUI but I need to run weka using command line
instead of GUI b/c I need to be able to use -g option
to see the graph (The Graph Virtualization freeze and
I cannot solve the problem).

If you can load the CSV file in the Explorer, what about if you save it
from there as ARFF file? Does that then work with the
ReplaceMissingValues filter in the console?

HTH

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Tatdow Pansombut

2007-04-11 00:54:35 UTC

I am running two Bayesian Networks Classifier from
command line and get some error. This does not happen
when I run weka using GUI but with GUI I cannot see
the graph (it freezes). Here are the command and the
error:

tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$ java -Xmx2048m -cp
/users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.classifiers.bayes.BayesNet -t header_RD.arff -g
-D -Q weka.classifiers.bayes.net.search.local.K2 -- -P
2 -S BAYES -E
weka.classifiers.bayes.net.estimate.SimpleEstimator --
-A 1.0
java.lang.NegativeArraySizeException
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcNodeScorePlain(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcNodeScore(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcScoreWithExtraParent(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.K2.buildStructure(Unknown
Source)
at
weka.classifiers.bayes.BayesNet.buildStructure(Unknown
Source)
at
weka.classifiers.bayes.BayesNet.buildClassifier(Unknown
Source)
at
weka.classifiers.Evaluation.evaluateModel(Unknown
Source)
at
weka.classifiers.Classifier.runClassifier(Unknown
Source)
at
weka.classifiers.bayes.BayesNet.main(Unknown Source)
tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$ java -Xmx2048m -cp
/users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.classifiers.bayes.BayesNet -t header_RD.arff -g
-D -Q
weka.classifiers.bayes.net.search.local.HillClimber --
-P 2 -R -N -S BAYES -E
weka.classifiers.bayes.net.estimate.SimpleEstimator --
-A 1.0
java.lang.NegativeArraySizeException
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcNodeScorePlain(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcNodeScore(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcScoreWithExtraParent(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.HillClimber.updateCache(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.HillClimber.applyArcAddition(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.HillClimber.performOperation(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.HillClimber.search(Unknown
Source)
at
weka.classifiers.bayes.net.search.SearchAlgorithm.buildStructure(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.buildStructure(Unknown
Source)
at
weka.classifiers.bayes.BayesNet.buildStructure(Unknown
Source)
at
weka.classifiers.bayes.BayesNet.buildClassifier(Unknown
Source)
at
weka.classifiers.Evaluation.evaluateModel(Unknown
Source)
at
weka.classifiers.Classifier.runClassifier(Unknown
Source)
at
weka.classifiers.bayes.BayesNet.main(Unknown Source)

I think there is problem with array size but I do not
know what could have cause it. Please help. Thank
you.

Ann

Peter Reutemann

2007-04-11 01:48:31 UTC

Post by Tatdow Pansombut
I am running two Bayesian Networks Classifier from
command line and get some error. This does not happen
when I run weka using GUI but with GUI I cannot see
the graph (it freezes).

Any error messags in the console?

Post by Tatdow Pansombut
Here are the command and the
tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$ java -Xmx2048m -cp
/users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.classifiers.bayes.BayesNet -t header_RD.arff -g
-D -Q weka.classifiers.bayes.net.search.local.K2 -- -P
2 -S BAYES -E
weka.classifiers.bayes.net.estimate.SimpleEstimator --
-A 1.0
java.lang.NegativeArraySizeException
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcNodeScorePlain(Unknown
Source)

...

Post by Tatdow Pansombut
tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$ java -Xmx2048m -cp
/users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.classifiers.bayes.BayesNet -t header_RD.arff -g
-D -Q
weka.classifiers.bayes.net.search.local.HillClimber --
-P 2 -R -N -S BAYES -E
weka.classifiers.bayes.net.estimate.SimpleEstimator --
-A 1.0
java.lang.NegativeArraySizeException
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcNodeScorePlain(Unknown
Source)
at
weka.classifiers.bayes.net.search.local.LocalScoreSearchAlgorithm.calcNodeScore(Unknown
Source)

...

Post by Tatdow Pansombut
I think there is problem with array size but I do not
know what could have cause it.

I ran both of your setups on the anneal UCI dataset and they work
without any problem. Are you sure you used the same dataset, in the GUI
and from commandline?

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Tatdow Pansombut

2007-04-17 19:57:06 UTC

Hello,
I am using the following command to create a data set:

tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$ java -Xmx2108m -classpath
/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5/weka.jar
weka.classifiers.bayes.net.BayesNetGenerator -N 20 -A
60 -C 4 -M 1400

And get this error:

Exception in thread "main"
java.lang.NoClassDefFoundError:
weka/classifiers/bayes/net/BayesNetGenerator

My weka.jar reside in
/Users/Ann/Documents/Bayesian_toolkits/weka3-5-5

Please help. Thank you.

Ann

Peter Reutemann

2007-04-17 20:49:32 UTC

Post by Tatdow Pansombut
tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$ java -Xmx2108m -classpath
/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5/weka.jar
weka.classifiers.bayes.net.BayesNetGenerator -N 20 -A
60 -C 4 -M 1400
Exception in thread "main"
weka/classifiers/bayes/net/BayesNetGenerator
My weka.jar reside in
/Users/Ann/Documents/Bayesian_toolkits/weka3-5-5

/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5
(your java call)

is not the same as

/Users/Ann/Documents/Bayesian_toolkits/weka3-5-5
(where your weka.jar is located)

You missed a hyphen before "3-5-5".

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Tatdow Pansombut

2007-04-17 23:55:12 UTC

Post by Tatdow Pansombut
I am using the following command to create a data

tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5

Post by Tatdow Pansombut
Ann$ java -Xmx2108m -classpath

/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5/weka.jar

Post by Tatdow Pansombut
weka.classifiers.bayes.net.BayesNetGenerator -N 20

-A

Post by Tatdow Pansombut
60 -C 4 -M 1400
Exception in thread "main"
weka/classifiers/bayes/net/BayesNetGenerator
My weka.jar reside in
/Users/Ann/Documents/Bayesian_toolkits/weka3-5-5

/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5
(your java call)
is not the same as
/Users/Ann/Documents/Bayesian_toolkits/weka3-5-5
(where your weka.jar is located)
You missed a hyphen before "3-5-5".

Soryy, I miss-type in the e-mail, the weka.jar resides
at

/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5

So this still does not work:

tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$ java -Xmx2108m -cp
/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5/weka.jar
weka.classifiers.bayes.net.BayesNetGenerator -N 20 -A
60 -C 4 -M 1400
Exception in thread "main"
java.lang.NoClassDefFoundError:
weka/classifiers/bayes/net/BayesNetGenerator
tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$

Peter Reutemann

2007-04-18 00:59:22 UTC

Post by Tatdow Pansombut
Soryy, I miss-type in the e-mail, the weka.jar resides
at
/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5
tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$ java -Xmx2108m -cp
/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5/weka.jar
weka.classifiers.bayes.net.BayesNetGenerator -N 20 -A
60 -C 4 -M 1400
Exception in thread "main"
weka/classifiers/bayes/net/BayesNetGenerator
tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5
Ann$

OK, take a closer look: you have "-3.5.5" in your path, but you use
"-3-5-5" in your classpath. Hence the Exception.

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Tatdow Pansombut

2007-04-18 03:00:30 UTC

Thank you so much.

Ann

Post by Tatdow Pansombut
Soryy, I miss-type in the e-mail, the weka.jar

resides

Post by Tatdow Pansombut
at
/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5

tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5

Post by Tatdow Pansombut
Ann$ java -Xmx2108m -cp

/Users/Ann/Documents/Bayesian_toolkits/weka-3-5-5/weka.jar

Post by Tatdow Pansombut
weka.classifiers.bayes.net.BayesNetGenerator -N 20

-A

Post by Tatdow Pansombut
60 -C 4 -M 1400
Exception in thread "main"
weka/classifiers/bayes/net/BayesNetGenerator

tatdow-pansombuts-computer:~/Documents/Bayesian_toolkits/weka-3.5.5

Post by Tatdow Pansombut
Ann$

OK, take a closer look: you have "-3.5.5" in your
path, but you use
"-3-5-5" in your classpath. Hence the Exception.
Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science,
University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph.
+64 (7) 858-5174
_______________________________________________
Wekalist mailing list

https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Peter Reutemann

2007-04-04 20:09:48 UTC

Can WEKA handle a dataset of more that one million instances with 2 or 3
instances?

Do you mean "2 or 3 attributes"?

I have a large dataset that I am having hard time to classify?

What's the problem with it? With such large datasets you either have a big
machine (best to have a 64bit one with lots of RAM) or an incremental
classifier. Classifiers implementing the
weka.classifiers.UpdateableClassifier interface can process data in an
incremental way and have therefore a smaller memory footprint (check out
the Javadoc for this interface to see what classifiers implement this
interface).

BTW if you're posting a new topic, please start new emails with new
subject and don't reply to old ones without changing the subject.
Otherwise those messages get sorted incorrectly and they'll be hard to
find in the mailing archives.

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Epa Uwimana

2007-04-05 02:21:17 UTC

I have a very large dataset of over one million instances and 3
attributes (thanks to Peter).
I want to classify this dataset using WEKA and I have a feeling that
it would be too big.

Peter is suggesting using the weka.classifiers.UpdateableClassifier
and I am willing to try it but I was wondering if anybody has used
this before and how they liked it. Also,how do I use it? I am sorry
this might sound basic but I am new to the list and I would
appreaciate any help.

Thanks

Peter Reutemann

2007-04-05 02:27:57 UTC

Post by Epa Uwimana
I have a very large dataset of over one million instances and 3
attributes (thanks to Peter).
I want to classify this dataset using WEKA and I have a feeling that
it would be too big.
Peter is suggesting using the weka.classifiers.UpdateableClassifier
and I am willing to try it but I was wondering if anybody has used
this before and how they liked it. Also,how do I use it?

"UpdateableClassifier" is just an interface that certain classifiers
implement. Check out the Javadoc of this interface, it lists what
classifiers implement it and therefore support incremental training
(best to do that from commandline!).

You don't have to do anything special, just treat it as any other
classifier.

HTH

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Epa Uwimana

2007-04-08 15:29:22 UTC

Is there any limit when it comes to number of attributes that a
dataset can have? I have a dataset of 400 instances and each instance
could have up to ~20000. Any help on how I can put this kind of data
in ARFF would be helpful.

Thanks

Daniel Jansson

2007-04-10 15:32:19 UTC

Post by Daniel Jansson
and is there a way to name the different
instances so that I can find out which instance is in which cluster?

When I use the filter AddCluster I get the same results that I get when I
use the getClusterAssignments() method in ClusterEvaluation. Since I know
in what order I added the instances I guess this tells me in which cluster
each instance is placed. Am I on the right track here?

I've added some code so that you can see what I've done.

ClusterEvaluation eval = new ClusterEvaluation();
SimpleKMeans skm = new SimpleKMeans();
skm.buildClusterer(data);
eval.setClusterer(skm);
eval.evaluateClusterer(new Instances(data));
System.out.println(eval.getClusterAssignments().length);

for(int i=0;i<eval.getClusterAssignments().length; i++){
System.out.println("instance: " + i + " in cluster " +
eval.getClusterAssignments()[i]);
}

Peter Reutemann

2007-04-10 21:40:22 UTC

Post by Daniel Jansson
When I use the filter AddCluster I get the same results that I get when I
use the getClusterAssignments() method in ClusterEvaluation. Since I know
in what order I added the instances I guess this tells me in which cluster
each instance is placed. Am I on the right track here?
I've added some code so that you can see what I've done.
ClusterEvaluation eval = new ClusterEvaluation();
SimpleKMeans skm = new SimpleKMeans();
skm.buildClusterer(data);
eval.setClusterer(skm);
eval.evaluateClusterer(new Instances(data));
System.out.println(eval.getClusterAssignments().length);
for(int i=0;i<eval.getClusterAssignments().length; i++){
System.out.println("instance: " + i + " in cluster " +
eval.getClusterAssignments()[i]);
}

It should be fine, even though the clusterer shouldn't be trained BEFORE
it get's passed on to the evaluation class (it happens in there). But I
wouldn't have bothered using the ClusterEvaluation class for that, I
would have just used the "clusterInstance(Instance)" method.
http://weka.sourceforge.net/wiki/index.php/Use_Weka_in_your_Java_code#Clustering_instances

HTH

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

Peter Reutemann

2007-01-16 19:59:28 UTC

Post by H M Ishrar Hussain
I have been trying to make up an Instance object for classification
Attribute abcd = new Attribute("abcd");
Instance xyz = new Instance(2);
xyz.setValue(abcd, 1.0);
...
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at weka.core.Instance.setValue(Instance.java:643)
at weka.core.Instance.setValue(Instance.java:716)
As I trace back, I find setValue() method of the Instance class calls
index() method of the Attribute class that always returns "-1" as the
index of the attribute. As a result it throws
ArrayOutOfBoundException.

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

H M Ishrar Hussain

2007-01-16 23:52:24 UTC

Hi Peter,

Thanks for your reply, but my problem is that I intend to classify one
Intance at a time using the classifyInstance() method, and for that I
only need to create one Instance object, not a complete dataset. I
have already trained my classifier with the dataset loaded from the
.arff file.

If you look closely on the code I sent you earlier, you will find that
I have not yet added the row to any dataset. I am receiving this array
index out of bounds exception, at the time of defining the "row"
(Instance) only. I am follwoing the style of coding used in the
following link:

http://weka.sourceforge.net/wiki/index.php/Programmatic_Use

I have already checked the link you gave me, but it explains building
a complete dataset ("Instances" object) on the fly, that won't serve
the purpose if I am to use classifyIntance() which expects one
Instance object as parameter.

As you mentioned about setting up the headers, I also tried using
setDataset() method as follwos:

Instances dataset = new Instances(
new BufferedReader(
new FileReader("training.arff")));
dataset.setClassIndex(dataset.numAttributes() - 1);

Attribute abcd = new Attribute("abcd");
Instance xyz = new Instance(2);
xyz.setDataset(dataset);
xyz.setValue(abcd, 1.0);
....

But, the same error with setValue() still persisted:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at weka.core.Instance.setValue(Instance.java:643)
at weka.core.Instance.setValue(Instance.java:716)

I am really confused about what I am doing wrong, and I will very much
appriciate your help. Thanks.

Regards,
H M Ishrar Hussain
Department of Computer Science and Software Engineering
Concordia University
Montreal, Canada

Your problem is that you have to create the header/structure of your
data first ("Instances"), before you can add rows ("Instance").
Have a look at this Wiki article, it explains how to create datasets on
http://weka.sourceforge.net/wiki/index.php/Creating_Instances_on-the-fly
HTH
Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
_______________________________________________
Wekalist mailing list
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Peter Reutemann

2007-01-17 01:24:21 UTC

Post by H M Ishrar Hussain
Thanks for your reply, but my problem is that I intend to classify one
Intance at a time using the classifyInstance() method, and for that I
only need to create one Instance object, not a complete dataset. I
have already trained my classifier with the dataset loaded from the
.arff file.

An instance/row always needs the reference to a header, only there is
the information about the attributes stored.

Post by H M Ishrar Hussain
If you look closely on the code I sent you earlier, you will find that
I have not yet added the row to any dataset. I am receiving this array
index out of bounds exception, at the time of defining the "row"
(Instance) only. I am follwoing the style of coding used in the
http://weka.sourceforge.net/wiki/index.php/Programmatic_Use
I have already checked the link you gave me, but it explains building
a complete dataset ("Instances" object) on the fly, that won't serve
the purpose if I am to use classifyIntance() which expects one
Instance object as parameter.
As you mentioned about setting up the headers, I also tried using
Instances dataset = new Instances(
new BufferedReader(
new FileReader("training.arff")));
dataset.setClassIndex(dataset.numAttributes() - 1);
Attribute abcd = new Attribute("abcd");
Instance xyz = new Instance(2);
xyz.setDataset(dataset);
xyz.setValue(abcd, 1.0);
....
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at weka.core.Instance.setValue(Instance.java:643)
at weka.core.Instance.setValue(Instance.java:716)

OK, here's a bit of code to explain the generation of a single row:

import weka.core.*;
import java.io.*;

public class InstTest {
/**
* takes a filename as first argument
*/
public static void main(String[] args) throws Exception {
Instances dataset = new Instances(
new BufferedReader(
new FileReader(args[0])));
dataset.setClassIndex(dataset.numAttributes() - 1);

Instance xyz = new Instance(dataset.numAttributes()); // [1]
xyz.setDataset(dataset); // [2]
xyz.setValue(dataset.attribute(0), 1.0); // [3]
System.out.println(xyz);
}
}

[1]
create a new row with the number of attributes that are in the original
dataset (i.e., the training set).

[2]
set the dataset, otherwise we the row/instance has no idea about the
attributes and their types.

[3]
set the value of an attribute. *Important:* uses an attribute of the
dataset we just set, i.e., the training set. Otherwise the index in the
values array of the row/instance cannot be determined. And this is what
happens in your code, since the attribute abcd doesn't exist in the
attributes of "dataset". Alternatively, you can also use the index
instead of the Attribute reference:
xyz.setValue(0, 1.0); // [3]

HTH

Cheers, Peter

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

H M Ishrar Hussain

2007-01-18 08:49:01 UTC

Thank you so very much Peter for your in dept explaination. I got it now.

Regards,
Ishrar

An instance/row always needs the reference to a header, only there is
the information about the attributes stored.

import weka.core.*;
import java.io.*;
public class InstTest {
/**
* takes a filename as first argument
*/
public static void main(String[] args) throws Exception {
Instances dataset = new Instances(
new BufferedReader(
new FileReader(args[0])));
dataset.setClassIndex(dataset.numAttributes() - 1);
Instance xyz = new Instance(dataset.numAttributes()); // [1]
xyz.setDataset(dataset); // [2]
xyz.setValue(dataset.attribute(0), 1.0); // [3]
System.out.println(xyz);
}
}
[1]
create a new row with the number of attributes that are in the original
dataset (i.e., the training set).
[2]
set the dataset, otherwise we the row/instance has no idea about the
attributes and their types.
[3]
set the value of an attribute. *Important:* uses an attribute of the
dataset we just set, i.e., the training set. Otherwise the index in the
values array of the row/instance cannot be determined. And this is what
happens in your code, since the attribute abcd doesn't exist in the
attributes of "dataset". Alternatively, you can also use the index
xyz.setValue(0, 1.0); // [3]
HTH
Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
_______________________________________________
Wekalist mailing list
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Le Thi Kim

2007-04-04 20:07:38 UTC

There are few simple steps to convert csv file to arff.

1- open csv file in Notepad
2- adding @relation ..... Like the book: machine learning tools:...
Guildance, then save it into the new file or the same file. It doesn't
matter.
3- rename that file with the extension .arff instead of csv. Accept
everything.
It 's done.

Hope you can manage,

Cheers,

Thoa Le

-----Original Message-----
From: wekalist-***@list.scms.waikato.ac.nz
[mailto:wekalist-***@list.scms.waikato.ac.nz] On Behalf Of Tatdow
Pansombut
Sent: 04 April 2007 19:20
To: ***@list.scms.waikato.ac.nz
Subject: [Wekalist] Problem: Convert csv file to arff

Hello,
I need to convert csv file to arff so I use the
command:

java -cp
/Users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.core.converters.CSVLoader Input/CPDBAS_v3b.csv >
Input/CPDBAS_v3b.arff

I look at CPDBAS_v3b.arff and it seems to have the right arff format (it
is very large file so I do not know for sure). But when I try to run
the filter with this command:

java -cp
/users/Ann/Documents/Bayesian_toolkits/weka-3.5.5/weka.jar
weka.filters.unsupervised.attribute.ReplaceMissingValues
-i Input/CPDBAS_v3b.arff -o Input/CPDBAS_v3b_R.arff

I get the following error:

java.io.IOException: nominal value not declared in header, read
Token[N=C], line 1
at weka.core.Instances.errms(Unknown Source)
at weka.core.Instances.getInstanceFull(Unknown
Source)
at weka.core.Instances.getInstance(Unknown
Source)
at weka.core.Instances.readInstance(Unknown
Source)
at
weka.core.converters.ArffLoader.getNextInstance(Unknown
Source)
at
weka.core.converters.ConverterUtils$DataSource.hasMoreElements(Unknown
Source)
at weka.filters.Filter.filterFile(Unknown
Source)
at weka.filters.Filter.runFilter(Unknown
Source)
at
weka.filters.unsupervised.attribute.ReplaceMissingValues.main(Unknown
Source)

I don't think the conversion from csv file to arff is incorrect, but I
cannot figure out what gone wrong.
Please help.

Note that I don't have any problem loading csv file with the GUI but I
need to run weka using command line instead of GUI b/c I need to be able
to use -g option to see the graph (The Graph Virtualization freeze and I
cannot solve the problem).

Ann

_______________________________________________
Wekalist mailing list
***@list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

Fábio César Medeiros

2007-04-04 21:15:43 UTC