2.1.0
------------
FINISH: Reporting for analyzing multiple runs.
 - START reporting tree convergence statistics.
 - START reporting the deviance measure that is reported by Lartillot.
IMPLEMENT: automatic step size detection for MH based on BEAST?
COMPARE with 2.0.2 on some single-partition data sets 
 - EF-Tu-45? Globins-few.fasta? 5S-48? HIV? Enolase?
GENERATE: some examples.
REPORT: Include at least one tree-mixing measure.
 + Obtain last 1/3 of dist as equilibrium, sub-sample down to 600 trees
 + Plot distance to equilibrium, vs time., sub-sampled down to 1000 trees.

Make alignment-max stop storing so many alignments in RAM at once...
# We could clearly do an incremental algorithm.... IF we knew in advance which
  alignments we would keep...

Add a tree-distance convergence plot.

Search: SHOULD we force it to evaluate a sum over alignments for at least a few other trees if the
      current branch comes out best?
      * Use a boost::bitvector<> -> 32 times less RAM?

Examine prior rate dispersal for DP[2], DP[4], DP[6], DP[8] with no data.
 - Make it tighter a priori, so that posterior dispersal actually means something...

Examples: add more?  Microbotryum?

*Burnin*
- perhaps burnin would benefit from some 'wider' moves
  + SPR+slice sample branch length.
  + Double-augmenation -- proposing trees with a different alignment.
    (SPR-Gibbs-Sampling.lyx)

TEST: What will decrease the burnin time for Globins-few.fasta?
 - Variance will be high, so we need to do at least 5 runs for each data set, preferably 10.
 - Will NNI+A decrease the time?
 - Will removing SPR decrease the time?
 - Will doing more SPR-flat+A decrease the time?

*Report* Make trees-consensus write out a list of partitions directly, so we don't need perl?

*Documentation*
 - Make programs more self-documenting.
   + Make statreport describe PSRF-80%CI and PSRF-RCF.

*Report* - Improve the SDSF plot.
 - Should we output ALL groups in --LOD-table, perhaps by including column headers?
 - How should we report the vertical range beyond min->max?
 - How can we make an SVG or PNG of this plot?

*Timing*
 - Make a cheap NNI move that doesn't change branch lengths.
   + Changing branch lengths is more expensive for codon models.
 - What is the affect of NNI_AND_A on 18S burnin?
 - Allow altering frequency of various moves.
 - Decrease default frequency of proposals for substitution moves.
   + Increase dirichlet priors from 1.0 -> 2.0?
 - Are SPR_and_A_flat( ) and SPR_and_A_path( ) really taking twice as much CPU time as SPR_and_A_all( )?
   + Might this relate to SPR_and_A_all( ) ignoring cherries?
   + Or, perhaps it is because SPR_and_A_all doesn't resample the alignment
     when the proposed tree is not different from the current one.
     - Probably it SHOULD resample the alignment... but only once, not twice.
     - Probably we should FORCE it to propose a different tree.
 - Run nodes_master more often.... for long sequences?  When substitution is slow?
 - Why am I getting +-1.0e-15 errors in some times?
 - Why does SPR_and_A_nodes sometimes register 0 CPU time despite having 100 calls?

Automatically adjusting step widths for MH?

--
2.2.0:

Make LC able to get shorter.
 * Separate set_length( ) and 'reserve' for likelihood caches.

Number of tokens may be misleading, since we create temporaries.
 * Currently, we need 2*B branches w/o SPR_and_A_all, and 4*B with.
 * Allow temporarily storing, and then re-associating, a Likelihood_Cache_Branch.

Examine effect ON BURNIN of forcing SPR-all+A to check at least one other alignment...

Handle letters outside of our alphabet more gracefully.
 - Allow ANYTHING if the alphabet is specified.
 - Allow ANYTHING if the frequencies aren't off by too much.
 - Only refuse to run if we have weird letters AND the frequencies
   are off by too much.

Alignment Constraints #1: Implement a bandwidth around the current path
 - 2x penalty, because we need to look at the reverse proposal :-(
 - How do we handle multiple partitions?
 - How do we handle rejection based on sequence length?
   + Latex up this kind of thing!

Alignment Constraints #2:
 - Fix the alignment of some subset of the sequences, regardless of
   where they live on the tree.  This would allow us to alignment a
   "query" sequence to a "template" alignment.
 - Actually, this will prevent SPR from working...

Preliminary implentation of #openmp

Specify WHAT FRACTION OF TIME to spend on each move.

Make a real interface for fixing the alignment, topology, frequencies, branch lengths, etc.

Jointly sample A/epsilon A/lambda.  3-way/2D?  Yeah.  But also, 5-way, 3way, etc.

DOC: Change FIXME to Doxygen \todo

FINISH: C20 resample aa frequencies.
 - Why does the C20 model give lower likelihoods?
 - And, longer alignments?

Determining Memory Size
 * use <sys/sysinfo.h> on Linux to determine memory size.
   * http://stackoverflow.com/questions/349889/how-do-you-determine-the-amount-of-linux-system-ram-in-c
 * use /usr/bin/vm_stat on Mac?  Or /usr/include/sys/vmmeter?
 * on cygwin... /proc/meminfo? Or sys/sysinfo?
 * on mingw... ???
Impose memory limits???
   * On linux, do ulimit -v <kilobytes>
   * on mac, ulimit -v has no effect.
   * on mingw...
   * on cygwin...
Bail on some routines if we ask for too much RAM?
   * but what if we ALWAYS bail?
   * then would should actually quit.
Move "alignment-burnin" to setup-mcmc.C
   * We could do more NNI+A, for example...

Cleanups:
 * Rename s/lambda/weight/ in mcmc.{H,C}
 * Make MoveOne do sampling W/O replacement!
 * Lazily recalculate the branch transition matrices.
   - How does the cache object know which SModel to use?
   - If it holds a pointer to it, do we need to update the pointer when we copy the Parameters object?

SPEED: Resampling the following parameters shouldn't require peeling along the tree:
 * Changing the rates in each category should require re-peeling.
   - (log-normal and gamma)
 * Changing the frequencies of submodels with different rates, actually changes the rates.
   - (DP::f[] and M3::f[])
 * Reallocations frequencies between submodels with the SAME rates should NOT require re-peeling.
   - (C10, C20)
 * It should ALSO not require recalculating transition matrices! (which is WHY re-peeling is not necessary.)

*Report* - Alignment Mixing
 - I can generate a mixing graph by comparing column probabilities for different alignments...
 - I can also generate ESS values per column.
 - This is only going to be meaningful when the column support is not too small or too large.
 - I can generate more plots, if R is installed.
   * Generate the mixing diagnostics chart.  If one chain only, take 2,3,4 quarters.
   * Make trees-bootstrap write out the LOD's for each group to a tab-delimited file,
     if asked.

ADD a 'Covarion' model from an exchange matrix and with n levels of invariance.
- How to allow column-to-column variation in rho?

*Report* Speedup: implement --output=svg,pdf in draw-tree.

Alignment-indices [constraints]: intelligently find the previous/next column.
 - Actually, do something more intelligent based on pairwise things, like FSA does...
   + But then how would you take advantage of local conservation... "ALL sequences have a G"

Make moves report meaningful names for their various statistics.
 - Make the MCMC::stats( ) object into a map<string,pair<int,double> > or something.
 - Make NNI report how many branches peeled on average,
 - Make SPR report how many branches peeled on average divided T.n_branches()

Output internal node sequences, along with trees that have NAMED internal nodes.

*Documentation*
 - Doxygen 'front matter' page:  
    + Describe 'Model' overview and specific classes and functions.
 - Doxygen: document 'Parameters'    
 - Allow `bali-phy --help=topic'.  Allow --help=advanced to report the invisible advance options.

*Report* Report a 95% confidence interval of splits?

Branch priors - make into a sub-model, allow a mixture-of-two-{Exponential, Gamma}.

Build constraints the same way FSA does.

Replace T.branch(  ) with T.undirected_branch( ) or T.branch( ).undirected(). [step 1]
Replace T.directed_branch( ) with T.branch( ). [step 2]

*Report* - Branch lengths for mc-trees
- Can I get branch lengths for the mctree... but without node sizes?
- For each branch, mask out taxa that wander over 
  * edges directly connected (mask of A, A<<B, A does not directly wander)
  * edges that wander over the same region as we do (~A.group1, A<<B, A directly wanders) 
  Node masks simply mask out all taxa that wander over connected edges.

  Then, if a branch implies a partition with its new, increased, mask, then contribute to that partition.

  What are the resulting corner cases?
   - the same branch can imply two parallel edges.
   - if X and Y wander over 12|34, then what if the sample tree has XY|1234 ?
     + X and Y have the reduced splits X|1234 and Y|1234, so XY|1234 implies both.
     + we therefore give half the length to each one.
   - when does each branch in a compatible tree imply just one reduced split?


--
P1. SPR: intelligently select subtree, as well as attachment point?

P29. See genome-wide transducer paper.  Can I impose similar constraints?
 * How do they record/store their bandwidths on the DP matrix?

P28. Improve alignment-indices.
P5. Speed up post-processing.
P6. Improve usability. (post-processing?  installation?  visualization?)
P11. Make post-processing handle multiple chains, assess convergence.
P16. Generate better mixing diagnostics (RNe1 and RCI1 aren't really working.... Report SDSF for each. MSDSF is OK.)

P7. ECR2 moves (for fixed-A, and for variable-A, with P4).
P9. MCMC / MPI - Construct a heated alignment prior.

P8. Branch priors - make into a sub-model and allow a mixture-of-two-{Exponential, Gamma}.
P15. Make matcaches (1) [b][m] instead of [m][b] and (2) be able to invalidate individual branches.     

P17. Improve alphabet handling for speed, flexibility, and features.
P18. Allow alphabets to handle unknown letters in some way... detect alphabet based on fraction of atgcu
P22. Speed up handling of large trees, alignments.

P21. Make debug builds yield the same output.

P4. Allow deterministic searching for internal node states, for SPR tree moves without re-aligning.
P23. What should the ideal mixing rate be for topologies? (accept based on MCMC fractions)
P20. Adaptive bandwith computation
- Consider a move with a bandwidth around the current path
- Consider specifying an input alignment around which to calculate bandwidths.
  + (?) Use the input alignment in the prior?
P24. Improve speed of dynamic programming
- sort states to group those with the same prev state (i.e. same di and dj)
P2. Determine how much tree sampling could be improved by proposing with the right branch lengths.
 - Implement a kind of SPR_all move, but based on the number of topology samples, only.
 - Output is posterior probability of topology.
 - Is the probably of the data given a topology just the average of the likelihoods given the topology?

P25. Fix Slice NNI, and determine why 3-way NNI isn't better than 2-way NNI.
 * Slice NNI changes w/ max probability 50% in 2-way case, 66% in 3-way case.
   - Can we improve this, somehow?  Ask Paul Lewis?  Explain NNI-slice moves?
 * Why is the 3-way NNI (choose_MH) only accepting slightly more often than the 2-way NNI?

P26. Improve drawing of multiconnected trees by using equal-daylight.
P27. Make alignment constraints not waste memory for unused cells.

P30. Write a boost::spirit parser for the Newick trees
P31. All reading/writing branch, node, and tree attributes.
P32. Figure out how tracer is doing color blending for overlapping marginal densities... it looks "right"!
P33. Slice sampling for scale-means-only.... rethink parameterization?
P34. How to make the sub-level graphs less spiky?
 - implement back-off - that is, allow thinning out of 
P35. [Cleanup] Move MC_tree_with_lengths to mctree.H, and make an operator<<( ) for it.
P36. Lazily recalculate branch transition matrices.
P38. Can I eliminate MCMC::Move.iterate(P,Stats,int) for moves that do not need the int?




P17. Improve alphabet handling for speed, flexibility, and features.
     - [!]Speed up mapping of letters to integers.

     - (?) Handle lower-case nucleotides by allowing synonyms for letters.
     - (?) Make upper-case letters synonyms?
     - Individually add letters: add_letter("a","Adenosine","A")
     - Single-letter alphabets just use an array to map letters to integers.
     - Generic word alphabets use a map<string,int>.

P16. Generate better mixing diagnostics
     - RNe1 and RCI1 aren't really working, though MSDSF is OK.
     - Report SDSF for each split.
     - Report what lartillot is reporting, also.

P10. Make CAT models faster.
 - Can this be virtualized?  How much speed do we lose by computing exp(Qt)*v using a virtual call?
 - Can I avoid recalculating exp_a_t so often?

P5. [Post-processing:]
  + We actually spend a lot of time DRAWING the mc trees.
  + What is a fast way to estimate node lengths?

  + When printing SRQ plots, indicate the LOD range (or its abs?).
  + Make trees-bootstrap less RAM-intensive: don't cache so many things.

P?. Why does INV+Modulated+gamma[4]+INV crash?
    - Because you can't add models with different alphabet sizes ATM.

P8. [Branch priors]
1. [DONE] Use Gamma(1/2,mu) 
2. Allow also Uniform(0,n), Exponential(mu), and ???
3. What IS the second-largest eigenvalue... what does it mean?
4. leaf branches / internal branches
5. general prior against changes being shared?
6. How hard would it be to put 1% on zero length?

P16. [Improving Mixing Diagnostics / Improve Distribution-Comparison Diagnostics]
1. AWTY is right: some of these diagnostics are going to be plots.

3. Generalize trees-bootstrap to
 (a) Write out Global estimates
 (b) Write out Mixing diagnostics
 (c) Compare the different distribution groups.

4. What plots should be generated?
 (a) [AWTY] Cumulative & Sliding Window pictures of the most variable plots.
 (b) [AWTY] Pairwise comparise comparison of split frequencies between two runs.
 (c) [other] Give an ordered list of splits, with a 95% confidence interval for the estimates.
 (d) 'Var' analysis? -- Compare within / between variance for multiple chains.

5. Diagnostics
 (a) For discrete ranges: average fraction of total confidence interval in individual 80% ranges/0.8
     * This should not allow large changes in the statistic for infinitessimal changes in the distribution.
 (b) 

P26. Improve draw-tree
 1. Make draw-tree use device coordinates
   + But make this a separate, final, step?
 2. Put label lengths into leaf branch lengths.
 3. Fix equal-daylight to handle the case when there is no daylight available.
   + If there is no daylight, then make all angles go a /= 2
   + (?) Recalculate daylight under the condition that distance branches are ignored?
   * Also: do Dave's monkey-puzzle technique.
 4. Represent multiconnected tree as a regular tree, and use equal-daylight to place nodes,
    although they will have clouds
   + At first, assume that wandering branches attach at existing nodes, instead of in the middle of branches.
 5. Do not draw the line UNDER the cloud.... instead draw the line and cut it off at the cloud boundary. 
   + This requires some kind of path API.

P38. If a sub-move is called multiple times in a turn, it needs to know which time it is being called.


----
MPI
1. Output
2. Good estimates of LOD score < -3 and > 3
3. Good diagnostics for MC^3
4. How to implement rescaling of branch HMMs?

Alignment issues:
 - sometimes we actually WANT to retain empty columns! (For alignment-diff)
 - sometimes we actually WANT to allow multiple sequences.
 - can I figure out all places where these checks get called, and move the checks to there?

FIX DP::rate_dirichlet_N to multiply by the number of categories in the dirichlet...

Output and Postprocessing
 (A) Continue to improve HTML output to make it nicer.
 (B) Include more and more information
 (C) Allow COMPARING alignments
   (i)  Align the alignments :-)
   (ii) Allow drawing two alignments, one on top of the other.
   (iii) Shade differences in the second alignment in red?
   (iv) Write an alignment-diff that outputs text differences?   
 (D) Update the framework for the analysis script.  Move beyond PERL?
 (E) Make a Cairo version of alignment-draw to generate PS/PDF/SVG

Multiple Data Partition Stuff
 A. Partition-specific alphabets/smodels/imodels
    (b) ALPHABET        - allow specifying

 f. Currently we are altering the smodel inside the partitions.
    This requires copying it...
    (a) how expensive is this?
    (b) how easy to change?

 g. Did we add any slowdown in the -t case? In the other case?

Code Cleanliness

IDEA: parse "...+var~dist"  (could be +Something[var,dist]


"Rooted" version of indel models?
 A. Can we incorporate a better sequence length distribution?
 B. How to handle rooted trees?
    B1. Where is reversibility assumed?
    B2. If reversibility is only in the indel process, might be OK.
      (e.g. have one "root" for each process - seprate subst / indel "root")
    B3. Need to change the DP engines?
 C. "Conditionally normalized" indel models (transducers...)

Add more Markov-modulated models:
 A. Right now we can do 
    (a) Covarion:   INV+Modulated
    (b) Periodically resample rate from same distribution as across-site rate distribution.
 B. What models should we add?

General construction of MHMMs from PHMM
 A. specify emission probabilities in PHMM
 B. write a GENERAL routine to construct the MHMM given emission probabilities.
    B1. Cache the MHMM structure - e.g. will computing this become expensive?

Changing the number of categories
 A. just code it.  Perhaps slow, but...
 B. Non-DPP prior on number of columns in each category.
 C. DPP when number of columns is changing?
 D. 150 pi * 4 rates = 600 bins = impossible?

Improved Codon models/notation?
 A. F1x4 / F3x4 / F61
 B. We should be able to do some Marginal Likelihood tests on these.

-1. M7 should take a complete model, not just an exchange model!

0. The trees-consensus "--ignore taxon1,taxon2,.." functionality doesn't actually work?

1. Fix the Uniform distribution to have a sigma/mu parameter.

2. Branch length proposals
   (a) Can we SPEED UP the procedure to fit/estimate the posterior to a gamma?
       i. Guess the starting distance and variance from the conditional probabilities on
          both sides of the branch...
      ii. But what if there are multiple rates or frequencies?
   (b) Single branch: can we accept more NNI/SPR if we sample the branch length?
   (c) Can we sample from a MULTIPLE branch distribution?
   (d) Can we ignore the indel effects on branch lengths when we do?
   (e) How does the acceptance rate for fit_gamma depend on n_retries?

3. Multiple branch-length proposals
   (a) When considering multiple branch lengths, the information CAN be about the
       specific branches T[i] instead of the distances D[i][j] if we can infer WHICH
       branch the mutations happened on.

4. Topology proposals which alter multiple branches:
   (1) SPR connecting branch
   (2) SPR branch + merged branch
   (3) SPR branch + split branch (2)
   (4) SPR branch + split branch (2) + merged branch (1)
   (5) SPR branch + split branch (2) + adjacent in pruned tree (2)
   (6) SPR branch + split branch (2) + merged branch (1) + adjacent in pruned tree (2)

   (6?) SPR branch + merged branch (1) + neighbors (4)
   (7?) SPR branch + split branch (2) + neighbors (4)
   (12?) SPR branch + merged branch (1) + neighbors (4) + split branch (2) + neighbors (4)

5. How much of the sequence is in each %-identity fraction?
  a) How much information is in each fraction?
  b) How do we make sure that mismatched residues in the 90% identity fraction 
      don't end up in the 0% identity fraction?
  c) Do gaps count as "mismatches"? 
     Which leads to...
  d) Can we combine measures of gappiness and %-identity to yield a
     measure of ambiguity? 

6. GTR: When setting parameters, set all relevant parameters at once?

7. Write a program to try to combine trees (sub-branches?)
   (a) Print out the tree of no-conflict branches
   (b) Show conflicting pairs of branches from each file.
   (c) Start with trees - no branches in the same file can conflict.

8. Implement Correlated Site Classes.
  (a) Do the math.
    (i) When caching HMM state values, we must ALSO cache rates!
  (b) Do the coding?

9. Estimate constraints using alignment-indices
  a) specify number of constraints
  b) no gaps nearby (distance W+G) from constraint
  c) difference from consensus = 1 (no shared differences)
  d) 
  e) percentage of sites w/ an indel
  f) print the column number in the comments of each row

10. Use a RJMCMC on a fixed alignment to model locate changes in 'mu'
   +  or use a change-point model?
   +  or use a *nested* Change Point Model?
   + we need some way to find out if the gamma model "fits" well! (how
much of the variation is spatial?)

11. Find a way to display differences between alignments (alignment distributions?)
   + show how alignments vary
   + find parts of alignments which COVARY with the tree
     - does the tree cause the alignment to vary, or vice versa?

2. Put "epsilon" on the "mean_indel_length" scale.
5. Proposals and Priors
  a. use a parameter_names_ vector.
6. Output move stats in tracer-readable format. (or, at least, allow burn-in?)

7. How about having an alphabet option "AAA -" to indicate that this item is missing?

--------------------------------------------------------

-48. Indel model where we sum over all root locations: doesn't work for SPR?
  a. A mixture of P(A|root) doesn't factor into P(A[1]) * P(A[2]) * ... * P(A[B])
  b. We would have to augment the data with a root.
  [NOTE] But can we resample an alignment under a *mixture* of distributions.

-46. Code an initial test for the same-frequencies assumption?
	- likelihood ratio test: sequences S have frequencies p
                                 sequence S[i] has frequencies q 
	- can we reject p=q?  How much is the actual difference?
        - or, find groups using neighbor-joining? 
          + (twice as many branches..)
	  + N^2 cost of computing the pairwise distances...
            - how to make semi-additive distances?

-45. Speed up likelihood calculations where there are lots of gaps??
  + If there is only one character then (at least for equilibrium processes)
    the cost should be constant.
  + If there is more than one character, then the  cost should be proportional to
    the number of branches that connect the non-gap characters.

-------------------------------------------------------
 o Parsimony Model -> constant branch lengths
   - only change SPR, disable branch-length moves
     * this is if we do it at the tree-level
     * this SHOULD handle the multiple-rates case... or does it?
   - or, do it at the level of the substitution model
     * how do we set the length to "mu", or how do we handle P.branch_mean() in this case?
     * this SHOULD handle the multiple-rates case... or does it?
 o Gamma branch length model
 o compare posteriors for Full model, Parsimony model, and in-between model
   where T[i] = mu^a * b[i]^(1-a)
 o Use the parsimony posterior to propose or select far-away trees
   - (but is this any better than simply bootstrapping the posterior sample?)
   - MCMCMC approach with no (with parsimony) branch lengths.

 o Separating NNI & SPR into their own files
 o Add ECR 
   - re-sampling internal node states on more than 2 branches
 o Writing a function to approximate the branch-length distribution
   for a large number of adjacent branches
 o a better form for the joint branch density than a normal.

-------------------------------------------------------

-41. Improve proposals: 
  + Specify the NUMBER of times we do each increment per iteration.
  + How can we reference other parameters of the model? (e.g. "mu")
  + How do we pass back information about move success from each move?
  + Can we have the move parameters converge to something?
    - Should this be in the PROPOSAL, or something governing the proposal?

-------------------------------------------------------
Paper:
 * how to tell if a tree distribution is following a guide tree?
   Average # of shared branches?  How else to measure bias resulting
   from a fixed alignment?

TODO:
 * Write a tool to take a list of files, find common columns, and
    write out differently names files: 1.p, 2.p, ...
   - warn about over-writing files.

 * tree-mean-lengths: compute the length of each branch or node, conditioned on the
   existance of said branch or node.
   - if a branch doesn't exist - give it length 0.
   - if a node doesn't exist... also give it length 0?
 * fix alignment-find when last alignment is chopped.
 * alignment-info
   + use a neighbor-joining tree if no tree is supplied?
   + report # of indels (but internal sequences are missing!)

-------------------------------------------------------
Questions:

 Q: How can the distribution of INV::p be so exponential? (AATS)
 A1: Strong incorrect prior, not enought data!
 A2: Probably this happens more with gamma[] than log-normal[], since the gamma
     allows very low rate classes.  Does it happen with DP[5]?

-------------------------------------------------------

-37. Improve trees-consensus
  + Compute the entire set of branches / counts for full partitions.
  + Compute the greedy tree using these counts.
  + Report the first 4(N-3) internal branches.
    - Also report the  branches in the greedy tree?
  + [FIX] Currently the number of sub-branches is capped at |c50|
  + [FIX] 'ratio' for sub-partitions when parent partitions are below the min.
  + store topology as list of locations in partition list?
   a) Would this save memory?
      Some, if some of full partitions have a high frequency.
   b) Would this save CPU time?  Examine each partition once instead of n times.
      [But this wouldn't help for sub-partitions, would it?]
   c) We could report the number of full partitions that we observe.

-34. Can we put a Gamma prior on branch length? (Conjugate?)

-30. Find out where the system time is coming from...
  + Find out the system time for a test case with DP full-matrix.
  + Where is the system time coming from? (Use oprofile?)

-27. More data sets - more proteins!  
     Read some more tree-of-life papers.
	Which protein was it that supposedly did not saturate?  Can we plot saturation?  
	Can we determine the equilibrium distribution (effective entropy) ... by thinking about T ... ?

-24. Benchmarking of mixing... 
  a) try with alignment fixed
  b) look at log autocorrelation and effective sample sizes for distances
     between 2 nodes + sum all branches + sum all paths
  c) compare mixing with few taxa, many taxa...
  d) handle alignment predicates in the SRQ plot.
     can we output residues which are mixing badly?

-22. FastLSA with Constraints:
  + We have a series of tiles that are linked at their corners.
    - In each tile, we go forward given the (0,0) corner
    - We sample backwards given the (last) state at the (I,J) corner.
    - We need to SHARE the emission probabilities between tiles.

-20. MCMCMC
    a) implement the forking code / temperature identifier.
    b) implement the separate random number generators.

-16. Compare estimates w/ MrBayes
     a) compare mixing, topologies, branch lengths, speed, etc.
     b) globin / EF-Tu 48b / P1-123 / P1-1234

-13. Make a program to compare two alignment samples. [N=1 is a special case]
     a) If P(i~j) > 0.5, then compute P_1(i~j)P_2(i!~j)
     b) Report by marking a consensus alignment with a continuous or thresh-hold color scheme.
     c) How does a distribution of alignment CONSENSII fit into this?
 
-12. Allow alignment-gild to annotate an alignment-consensus

-10. Make trees-bootstrap.C print trees instead of collections of full partitions.

-7. Sanitize use of the MAP tree when the count is 1. Report a median?
     + but the estimated probability of the median could be 0.
     + this matters only when computing branch lengths?

-3. Think more carefully about using walk_tree_path
     + when changing the topology...  does this have to be reversible?

------
1. Plot #topologies in confidence interval (0,p) vs p. (e.g. sort topologies)
    Then plot this versus time...

    Also compare this between runs...

    Figure out burnin for bacteria data sets...

3. Indel model - fix so that prior probabilities exist!
    a) for now, we condition on leaf sequence lengths.

4. Start making some "nice" pictures to put up on the website.
    a) include pictures to give to Dr. Lake's lab.

5. Allow branch length mixtures?

6. Improve mixing for multi_freq model by sampling between columns.
    a) well, firstly, why does convergence take SOO long?
    b) if we could easily just change bin probabilities, then that would
       make it easier to attack this problem by increasing the number
       of MH steps we do.

7. Make a hash function for sub-partitions that is 
     (hash(left mask) * hash(right mask)), where the hash() function
     is a hash function for bitmasks and involves addition, not
     multiplication. 


------
A. sub-partitions: 

*B. How to make priors specified on command line?

C. Debugging framework 
   a) platforms
     1) intel c++
     2) cygwin
     3) MS Visual C++
  b) stlport
  c) compare
     1) MrBayes
     2) bambe
     3) BEAST
  d) mudflap -> alter GNUmakefile so that this is easy

*D. Make alignment-draw be able to output EPS

E. how does P(data|model) depend on number of frequency bins?

F. Allow alignment to contain only 2 sequences?

G. Find a better distance measure for alignment-median?

*H. implement: new framework for fixing things
	a) disable=alignment  -> fixed=A fixed=alignment
	b) disable=alignment_branch -> fixed=A- fixed=leaves-alignment
	c) disable=tree             -> fixed=tau fixed=topology
	d) disable=frequencies      -> fixed=pi  fixed=frequencies

------------------------------


I. fix: gamma+frequency+INV

---------------------------------- Clock -------------------------------------

Make rooted trees know which branches point
 (a) towards a node (parent branch)
 (b) away from a node (child_branches - just branches_after(parent_branch).

Q. So, how do we keep RootedTree's extra info up-to-date if methods from Tree can
still be called?
A. Public tree methods cannot disturb this setup.

Goals for generality
 1. allow sampling from relaxed-clock && the transducer model && the RNA model, as well as T+A
 2. allow submodels to share parameters
 3. separate prior from 'model'
 4. ability to construct samplers from a generic language description. (least important)

------------------------------------------------------------------------------
[FIXED] s-parameters aren't accepted/sampled often enough.

[FIXED] implement: 'f' parameter in ReversibleMarkov models

[FIXED] check: memory usage -> EF-Tu/12d ~ 120 Mb RAM ?
        (memory usage decreased)

[DONE]  make a separate DEBUG_CACHING

[DONE]  make change_parameters() debugging code call a function which outputs
        substitution parameters.

[DONE]  allow comments in the alignment constraint file, and make
        the alignment index program spit out the actual letters for each
        column as comments. 

[DONE]  Make tree-dist-compare consider all partitions w/ PP > 0.5

[DONE]  Stop using PHYLIP.  Dealing with truncated names is too painful!

[DONE]  alignment-gild: use list<alignment> instead of vector<>...

[DONE]  alignment-draw: implement bg-colors=type, bg-whiteness = uncertainty

[DONE]  Use boost::program_options and remove arguments.{H,C}

[DONE]  Estimate best alignment by finding the alignment with the greatest
   sum of PP(i ~ j) for all aligned pairs (i,j);

[DONE]  Fix "showonly" - think of better name. (urgh - picked show-only)

[DONE]  Fix setting of 'fixed' flags in model.H/C, to allow 'unfixing' parameters
        and also modifying fixed/unfixed for sub-models.

[DONE]  Fix smodel parameters aren't being sampled.

[DONE]  Make A3-stripped.fasta work with alphabet=Codons

[DONE]  Implement YangM2

[DONE]  Stop requiring --random-tree-ok


[DONE]  Implement sorting of subA columns to provide stable names when columns are re-ordered.
[DONE]  Remove invalidation in 3way sampling, 5way sampling, and tri-sampling.

[DONE]  tree-dist-compare: 
	a) by default, only show 1 topology per file
	b) allow consensus to take a comma-separated list of values

[DONE] add '--first' and '--last' arguments to alignment-find

[DONE] Update tools to use boost::program_options.

[DONE] Fix tools/analyze-distances: make an interface to get likelihoods by column...

[DONE] Record when each constraint is first satisfied, and when all are satisfied
[DONE] Make align-constraint= work SPR_and_A

[DONE] Get intel compiler

[DONE] alignment-draw: put all colormaps on the same scale by making colormaps
       nested and colorschemes not nested.

[DONE] alignment-draw: allow names like --color-scheme=AA+contrast+whiten --scale LOD

[DONE] Allow name[arg] formath in smodel description -> Empirical[wag] + gamma[4]

[DONE] Make alignment-draw simply use 'plain' if AU-file not specified.

[DONE] Better multi_freq model.  
    * frequency will be the frequency of the sub-model.
    * sum_l a(m,l) = 1
    * distribution(m) = sum_; a(m,l)*f(l)
    * f(m,l) = a(m,l) * f(l) / distribution(m)
    * two moves:
      - fiddle frequencies (as normal)
      - fiddle each row (one letter, all models)
    * prior -> each row is dirichlet

[DONE] 	Integrating joint-A-T and the 'hack' to set indel prior parameters.

        
[DONE] Report constraints satisfied by initial alignment/tree.

[DONE] Don't report constraints satisfied if there aren't any.

[DONE] Change DualModel to MixtureModel[n]
        a) make a way of specifying the priors

[DONE] Fix gamma model so that we can put MultiModels underneath it w/o crashing!

[DONE] Convert probabilities to efloat_t in substitution, model priors, etc.

[DONE] Make standardized checks for indexing problems in sample-node / sample-two-nodes.

[DONE] Rename YangCodonModel to YangM0

[DONE] Using regular FP math for DP, with a scaling factor per cell.
       * move created of std::vector< > temporary variables out of loops.
       * switched to using std::valarray< > instead of std::vector< > 
         for state_{array,matrix}
       * inlined di,dj,dk,dl,dc in 3way.H from 3way.C
       * stopped copying a whole bunch of state_{array,matrix}'s around
       * move initialization of s12_sub(,) to forward_cell( ).

[DONE] DPmatrix speedup - phase 2
       * add a boundary, and shift everything by (+1,+1)
       i then we can remove check to optionally break out of the inner loop
       * side effect is that we initialize as we go, even states whose predecessor
         is out of the square.
       * initialize boundaries in forward_square, with special case for (1,1)=S(0,0)
       * special case sample_path() for DPmatrixConstrained( ) because we don't initialize
         illegeal state / location pairs.
      ii REMOVE THE FULL INITIALIZATION in DPmatrix::DPmatrix( )!

       Also:
       * shift s1_sub, s2_sub, s12_sub to matrix coordinates by adding padding to the beginning.

[DONE] Fix sub-alignment indices
  a) make A?::construct reconstruct subA indices
     - indices are uniquely defined - reconstructing from scratch
       should not change them.
     - indices are dependant only on leaf-taxa alignment & tree
  b) reconstruct subA indices when tree changes 
     - sample-tri.C for sample-topology-SPR.C

[DONE] Don't store more that 1 DPmatrix at a time - free memory afterwards unless debugging.

[DONE] Add asymmetric MH proposal capability.

[DONE] Convert all densities to use log_double_t for *_pdf().

[DONE] Implement asymmetric proposals for sample_*_multi( );

  b) implement checking when using this routine

[DONE] Implement choose_MH( ).

[DONE] Write alignment-consensus.

[DONE] Start fiddling the 'f' parameter, but making it fixed by default.

[DONE] Allow specifying a non-HKY model underneath the Yang M0 model.

[DONE] (a) Estimate branch lengths for multifurcating trees
       (b) Pass the c50 tree w/ lengths to alignment-reorder

[DONE] Given a c50 tree, keep partitions that imply sub-partitions.
       If two partitions imply a sub-partitions, only keep the one
         with the smallest min(count(group1),count(group2))

[DONE] tree_sample cleanup part I.
       * Store leaf-sized bitmasks per branch.
       * Store leaf names once per tree_sample, not per-tree.
       * Reduces 29Mb -> 6Mb (factor of 5 decrease)

[DONE] Call get_Ml_sub_partitions_and_counts only ONCE.
       * pull out combinations of badly rooted branches.
       * pull out (A,B):-X as (-A-X) and (-B-X)
       * cache computed PP for all sub-partitions,
       * find M[l] branches at each level l in the cache

[DONE] Plot the number of supported branches for EVERY level.  
       Then we could branches vs level, which is less sensitive
        than looking at presence or absences at a SINGLE level.

[DONE] Do a better job at finding the minimal number of c50 branches
        that imply a partition.

[DONE] Make a 'Mixing' directory in addition to 'Results' and 'Work'.

[DONE] tree-to-srq: take partitions as input, not just trees.

[DONE] Make a ??? letter for Codon alphabets.

[DONE] Remove '--with-stop', and add new alpabet names "Amino Acids + stop"
       and "Codons + stop"  

[DONE] Print blocksize, seed, and pseudo count on the same line in [ ].

[DONE] Separate tree-dist-compare into two programs: 
        a) trees-consensus estimates MAP and M[l] topologies for a
           single tree sample 
        b) trees-bootstrap bootstraps support for partitions, possibly
           from multiple tree samples.
          i) speed up the trees-bootstrap by using the same samples
             for each predicate, so that we need only generate the
             random numbers once.

[DONE] Print N or X instead of *

[DONE] Stop computing leaf partition sets.
       Only compute+cache partition masks when we need them.

[DONE] Cleanup "unused parameter" and other warnings.

[DONE] trees-bootstrap:
       
     a) make the number of samples in the bootstrap a parameter
     b) use pseudocounts if specified.

[DONE] Make tree a virtual public base class for SequenceTree and RootedTree?

[DONE] Implement letter classes.
       a) Allow reading Codon alphabet w/ individual letter wildcards.
       b) Handle letter classes in parsimony analysis.
       c) Allow letter classes in other places.

[DONE] Put Data/ under version control.

[DONE] alignment-cut

[DONE] Build Windows binaries (cywin now has gcc 3.4)

[DONE] Build MacOS binaries: 
   + 10.3.9 and 10.4.x + 
   + 10.3.x < 10.3.8

[DONE] Use just one genetic_code.dat (the RNA version.)

[DONE] Merge all continuous parameters into one file 'p' that can be
   read by tracer.

[DONE] Write to a file 'out' instead of cout.

[DONE] Improve mixing for branch length mean 'mu'

[DONE] Fix sampling of frequency parameters by using dirichlet proposal.

[DONE] Allow 'mu' to be fixed.

[DONE] Complain instead of overwriting files.

[DONE] Read arguments from a config file.

[DONE] Change confusing "--gaps=star" to "--traditional".
       Don't have an indel sub-model in this case.

[DONE] Create proposal objects like 'between(0,1,shift_gaussian)'

[DONE] Remove Model::fiddle() [ and also super_fiddle() ] and implement
       external proposals.

[DONE] Output number and length of gaps to Tracer file.

[DONE] Have bali-phy open a directory to store files in.

[DONE] Remove 'super_parameters_'

[DONE] Models/SuperModels: Only recalculate models that changed, and allow
                            recalc to avoid recalculating cached stuff when e.g. "mu"
                            changes.

[DONE] Read (unset) settings from ~/.bali-phy.

[DONE] Separate P(S,x) from start_pi(x)

[DONE] DISTRIBUTION of %-identity?

[DONE] Implement codon-frequencies from amino-acid frequencies [no codon  bias].

[DONE] Go back to permitting trees w/ (e.g. root) nodes of degree 2.

[DONE] Improve branch-mean-lengths

[DONE] Make HTML alignments use mono-space fonts.

[DONE] Remove 'Yang' from M? models. (code and options.)

[DONE] Handle consensus alignments in alignment-gild.

[DONE] Improve discretization of distributions.

[DONE] Forbid values of sigma/mu that are too large
       by using the discretization mean.

[DONE] Separate Data/ files and examples.

[DONE] Allow specifying the location of GSL.

[DONE] Don't crash on 'SKIP=<> make' in --traditional mode.

[DONE] Draw MC tree using new tool 'draw-graph'.

[DONE] Gamma: be less fragile to large sigma/mu proposals.

[DONE] Implement topology constraints.

[DONE] Implement alignment constraints.

[DONE] Remove spaces from alphabet names.

[DONE] Improve initial convergence by fixing lambda/delta for first 5 iterations.

[DONE] Speed up loading of (large) alignments with many sequences.

[DONE] Implement general distributions 
         - but provide a parameterization by sigma/mu

[DONE] Fix Beta distribution - don't assume the natural scale is 1.0!

[DONE] Handle character 13 (PC format) in alignment and tree files.

[DONE] Correctly create MAP-AU2 for DNA&RNA alignments.

[DONE] Report nuc- and aa- tree lengths in 1.p and alignment-info

[DONE] Use boost::shared_ptr<> instead of my RefPtr<> in refcount.H

[DONE] Reduce the time spent copying things:
       + use a new cow_ptr<> based on boost::shared_ptr<> 

[DONE] Fix bug in sampling 'epsilon' :-)

[DONE] Switch from '!' to '*' for stop codon, and '+' to '.' for default
       wildcard.

[DONE] Load and use partition-specific *substitution* models.

[DONE] Load and use partition-specific *indel* models.

[DONE] Add a sorted( ) wrapper for dirichlet proposals, so we can
       stop trying to sort DP::rate*'s in recalc(), which is a hack.

[DONE] Handle conflicting partial orders when adding a homology pair to the
       partial alignment.

[DONE] Output build FLAGS and ARCH information as part of the version information.

[DONE] FIX gamma quantile function to not return NaN ever.

[DONE] Allow partitions to share scales.

[DONE] Create separate MOVES for (e.g.) HKY::kappa, pi* in each SModel.

[DONE] Stop abusing valarray (comparing w/ unequal lengths)

[DONE] Use dynamic_bitset<> instead of valarray<bool> for sets.

[DONE] Allow a gamma branch prior.

[DONE] Read NEXUS tree files.

[DONE] P17. Speed up trees-distances so that we can compare more distributions, or distributions with more trees.

[DONE] P19. Make bali-phy compile with -std=c++0x

[DONE] Fix choose_MH( ) to be good enough that adding more options is always an improvement.

[DONE] Automatic step-size adjustment for Slice Sampling.

[DONE] Make SPR correctly handle multiple attachment site and also resample the alignment.

[DONE] Decrement total CI estimates up by one to make PSRF-80% usable for integers.

[DONE] Add optional upper and lower bounds on parameters, use in slice and MH moves.

[DONE] Set bounds on mu to [0, 0.5] during per-burnin.

[DONE] Improve initial convergence: decouple branch lengths and indel probabilities during initial burnin.

[DONE] Decrease memory use when there are lots of gaps, avoid peeling characters to nodes where they are -.

*IDEAS*

 - We need better ways to compute compute marginal likelihoods.

 - for HIV data, we want samples from DIFFERENT PATIENTS, not
   from the same patient, because recombination is limitted to
   within-patient samples.

   if we can avoid co-infection, then we are good!

   (a) check Shankarrapa samples from the HIV database...
   (b) ? ask Marc about asking people if we can use their data
       or asking questions about their data.

 - find some papers for the journal club?

 - introduce alphabet models - such as codon model frequencies depending
   on [A],[T],[G],and [C].  We allow them to deviate, but put evidence on
   them NOT deviating.

 - consider modelling functional divergence - Karin's branch problem?

 - for alignment - how do we balance "extra high rate of change on this branch"
   (e.g. different aa on each side ) with "unalign the two sides of this
   branch"?

 - if we can take a <X>% confidence region in tree space, then any strict
   consensus measure of that region would be an level-<X> consensus tree.

 - we could search for the <X>% of the samples that would give the most
   resolved tree.

 - consider a version of pair-HMM with 2x as many states: we would have a 
   "slow" and "fast" version of M,G1,G2 in which indels have different
   rates.

 -  more complex model: have 'slow' and 'fast' be properties of the
   individual letters in the state.  (gamma-distributed indel rates.)
   + this makes a "spatial" model, if it is done right.

 -  Homoplasy and variation in column frequencies.  
   a) have a 'two-letter' model for HIV sequence DNA.

 - "Autocorrelation" as a function of distance

   o graph distances as a function of the posterior LOD score of the events.

   o graph probability of the transition from 1->0 (or 1->0->1) vs time AND PP.

 - use proxy for Likelihood to propose new states.
   o look at the number of SPR changes required to reach "nearby" trees,
     and to move between "islands".=======
 -  graph distances as a function of the posterior LOD score of the events.

 -  IDEA #1: simulate P(Y|T,tau) as \prod P(Y[i][j]|T,tau)^n[i][j]
            where n[i][j] = 1/2**(D(i,j)-1) 
            wjere D(i,j) is the number of edges between nodes i and j.

 -  IDEA #2: Determine points C[i] as centers for n bins of a distribution.
            + bin boundaries will be at sqrt(C[i]*C{i+1]).
            + define f(x) = C[i] if x is in the i-th bin.
            + then we 
              - constrain E X = 1 under f(x).
              - minimize E |log(X) - log(f(X))|

	    + this will give us bins optimally spaced on the log scale.

    (Note: If the transform function is the quantile function, instead of the
     log function, then we regain the current standard behavior! :)

 - If we look at posterior rate distributions, then we will want to know
   how the rates are distributed spatially.  (This might explain why the
   shape of the rate distribution varies between proteins.)

   How much of the rate distribution can be explained by e.g. exposure
   to solvent?

   If we add alpha/beta/loop notation to each of the internal node residues,
   then this might help us get more realistic rates, as well as indel
   hot and cold spots.

 - How do indel hot and cold spots map to interior/exterior of 3D structure?

 - Can we label each residue with the probability that it will accept deletion?
   Can we label proposed indels with a relative probability of occurrence?
   (Invariant residues/blocks are easier, because they are never inserted.)
   Can we handle a model with both unequal proposals and acceptance rates?

 - If the invariant sites vary over time, we might achieve some fairly interesting
   decoupling of change from linear time.  If sites become invariant during some
   period of time, then they might change fairly little over a rather long time.
   If some sites become variable, then they might change a lot over a fairly short
   time.  If sites change their invariancy in blocks... then we could have an 
   interesting interaction of changes in selection and fitness effects over the
   tree.

 - Can we estimate selection pressure over the tree?  
   We could integrate out the parts we don't know.

 - Can we separate the alphabet/frequency model from the exchangeability
   model for substitution models?

   * This way, YangM0 doesn't have an HKY::f parameter.
   * This way, the *::f parameter is handled much more naturally.
   * This way, we can handle frequency parameters as parameters.
   * This way, we don't have a fixed *::f parameter, unless we need it.
   * This way, we can have more interesting codon frequency models.

 - If we have both coding and non-coding DNA from an organism (hopefully
   from a number of genes, and, I suppose, non-coding regions), then we
   could actually fit a model that determined the frequency of codons from
   the competing pressure of nucleotide and amino acide forces.

   Actually, we might need three different components:

   a) nucleotide mutation pressure
   b) nucleotide content selection pressure
   c) amino acid selection pressure

   Might only amino acid content change between genes?  In that case, we might
   be able to estiamte all the factors using only coding regions!

-  Model RNA as a sequence with variable equilibrium distributions...
   There are only 2^4 = 16 rate classes if everything is 0/1.
   Entropy???     

-  Site-dependent equilibrium frequencies:
  + Question: does entropy of frequencies affect branch lengths?
  + Combine saturation with (site-dependent) rate-change across the tree ?
  + Check effect of non-equivalent equilbrium frequencies.

- Quick ways to score gaps: tell Jeff.
   + Push gaps as high as possible in each column.
   + No nested gaps
   + Linear gaps

- Don't show SRQ or sum plots for partitions w/ less than <N=10> regenerations,
     a) But, perhaps the real issue is generating a confidence band for SRQ plots
     b) Develop an idea of how to do a statistical test based on SRQ plots...
        [ Find the worst sub-section - the flattest line that is longest ]
        [ This will give us the worst mixing rate that we can reject ]

-----
HOWTO
- How do I estimate constraints?
  * use alignment indices
- How do I fix the tree?
  * use --disable=tree
- 
