Tuesday, November 25, 2008
Primitive Java Collections (primitive hashset, etc.)
Java's built-in collections such as hashmap, hashset, etc are not efficient when you have a lot of data (e.g., 1 million of items), it's because everything in the hashmap (i.e. its keys and values) and hashset (i.e. its values) are stored as objects, which are expensive to store and access. So if you want to store, say, 1 million int in a hashset, use a primitive hashset instead, such as Trove, or COLT. They seem to be some of the best primitive collections. But I haven't compared their performace (memory consumption and access speed) so I don't know which one is better.
Tuesday, November 18, 2008
Downloading full CiteSeerX data
Here is, in my opinion, the easiest way to download the full dataset from CiteSeerX. (Note that CiteSeer is the older version, which is no longer updated.)
Steps for downloading the full dataset from CiteSeerX:
Steps for downloading the full dataset from CiteSeerX:
- Download and extract the "Demo" from http://www.oclc.org/research/software/oai/harvester.htm
- Go to the directory of the extracted files, type the following command to download the full dataset of CiteSeerX to the file "citeseerx_alldata.xml"
java -classpath .;oaiharvester.jar;xerces.jar org.acme.oai.OAIReaderRawDump http://citeseerx.ist.psu.edu/oai2 -o citeseerx_alldata.xml
Subscribe to:
Posts (Atom)
Blog Archive
-
►
2007
(12)
-
►
June
(9)
- WPF: getting a data-bound data template and the it...
- The best tutorial about data binding in WPF
- Day of Silence
- LearnWPF.com
- The correct way to repaint a form or control in c#...
- Postable: convert codes to make them show up prope...
- Obtaining mouse x and y coordinates from a Windows...
- Windows Message Handling, in C#
- Dealing with my clutter
-
►
June
(9)