Dear,

If you need CCM program, there 's one on http://groups.csail.mit.edu/rbg/code/multiling_induction/. However, because this code was written in OCaml and I found that it was quite difficult to compile it (even I had to fix some code lines and 'infer' the structs of configuration file and makefile). Hence, I hope you could save a lot of time.

1. install godi (http://godi.camlcity.org/godi/index.html)

2. install GSL - GNU Scientific Library

3. run godi_console, then install the following packages
[144] godi-ocaml 3.11.2 3.11.2 The core of the OCaml system
[ 40] base-pcre 7.7#1 7.7#1 The version of PCRE for GODI
[ 62] conf-pcre 6 6 Configures which pcre library
[ 81] godi-batteries 1.3.0 1.3.0 a community-maintained founda
[ 95] godi-camomile 0.7.1#7 0.7.1#7 Camomile is a comprehensive
[111] godi-extlib 1.5.1 1.5.1 User-supported Extended Stand
[114] godi-findlib 1.2.7 1.2.7 The findlib/ocamlfind package
[166] godi-ocamlgsl 0.6.2#1 0.6.2#1 GSL bindings for OCaml
[167] godi-ocamlmakefile 6.29.3#1 6.29.3#1 Generic Makefile to build OCa
[186] godi-pcre 6.1.0#1 6.1.0#1 Perl compatible regular expre
[200] godi-tools 2.0.15 2.0.15 godi_console and other tools

4. in the attached zip file, you can find executable files ccm and ccm_gibbs which were compiled on Ubuntu 10.10. You could compile the source code by running makefile
- 1. compile myDynArray
- 2. compile hashSet
- 3. compile util
- 4. compile ccm, ccm_gibbs
please change the links in the makefiles to yours.

5. prepare data. I wrote a small tool for this step. In folder 'corpus', there are two WSJ corpora: treebank and right-branching. The latter is used for initializing CCM if you want. To complete this step, just call runme.sh. You will find four files in the folder 'data'
- poses and brackets : POS and bracket sequences from the right-branching corpus
- test_poses and test_brackets : POS and bracket sequences from the WSJ treebank corpus

6. run ccm / ccm_gibbs. Taking a look at file 'config'. The structure is
senlen [int] // maximum sentence length
testlen [int] // maximum test sentence length
litmit [int] // maximum number of sentence
dir [string] // the dir path of the data
poses [string] // POS sequence file - for initializing
brackets [string] // bracket sequence file - for initializing
test_poses [string] // POS sequence file - for testing
brackets [string] // bracket sequence file - for testing
restarts [int] // times running test (you could run the test many time for doing some statistics)
addc [int] // I think that it 's for smoothing (please read the second paragraph, page 1413,
// Klein and Manning
addd [int] //

To run it, just call './ccm config'

The F-score should be about 67%. It's lower than the result reported in the Klein's paper (71%). I haven't found out the reason yet.

Best,
Phong