@@ -9,11 +9,7 @@ SYSTEM REQUIREMENTS:
99========================================================================
1010The preseq software will only run on 64-bit UNIX-like operating
1111systems and was developed on both Linux and Mac. The preseq software
12- requires a C++ compiler that supports C++11. The GNU Scientific
13- Library (GSL) is ** only required** if users would like to use the
14- ` bound_pop ` module. It can be installed using ` apt ` on Linux, using
15- ` brew ` on macOS, or from source available
16- [ here] ( http://www.gnu.org/software/gsl ) .
12+ requires a C++ compiler that supports C++11.
1713
1814INSTALLATION:
1915========================================================================
@@ -49,12 +45,6 @@ you must specify the location like this:
4945$ ../configure --enable-hts CPPFLAGS='-I /path/to/htslib/headers' \
5046 LDFLAGS='-L/path/to/htslib/lib'
5147```
52- We no longer require the GNU Scientific Library (GSL) for all modules
53- except for ` bound_pop ` . To use ` bound_pop ` , please install GSL and
54- configure with the following flag:
55- ```
56- $ ../configure --enable-gsl
57- ```
58485 . Compile and install the tools:
5949```
6050$ make
@@ -74,11 +64,7 @@ If the desired input is in `.bam` format, `htslib` is required. Type
7464```
7565make HAVE_HTSLIB=1 all
7666```
77- To use the ` bound_pop ` module, type
78- ```
79- make HAVE_GSL=1 all
80- ```
81- to make the programs. The HTSLib library can be obtained here:
67+ The HTSLib library can be obtained here:
8268http://www.htslib.org/download .
8369
8470INPUT FILE FORMATS:
@@ -103,34 +89,59 @@ USAGE EXAMPLES:
10389Each program included in this software package will print a list of
10490options if executed without any command line arguments. Many of the
10591programs use similar options (for example, output files are specified
106- with '-o'). To predict the yield of a future experiment, use ` lc_extrap ` .
107- For the most basic usage of ` lc_extrap ` to compute the expected yield,
108- use the command:
92+ with '-o').
93+
94+ We have provided a data directory to test each of our programs.
95+ Change to the ` data ` directory and try some of our commands.
96+ To predict the yield of a future experiment, use ` lc_extrap ` .
97+ For the most basic usage of ` lc_extrap ` to compute the expected yield,
98+ use the command on the following data:
99+ ```
100+ preseq lc_extrap -o yield_estimates.txt SRR1003759_5M_subset.mr
109101```
110- preseq lc_extrap -o yield_estimates.txt input.bed
102+ If the input file is in ` .bam ` format, use the ` -B ` flag:
111103```
112- If the input file is in .bam format, use the command:
104+ preseq lc_extrap -B -o yield_estimates.txt SRR1106616_5M_subset .bam
113105```
114- preseq lc_extrap -B -o yield_estimates.txt input.bam
106+ For the counts histogram format, use the '-H' flag:
115107```
108+ preseq lc_extrap -H -o yield_estimates.txt SRR1301329_1M_read.txt
109+ ```
110+
116111The yield estimates will appear in yield_estimates.txt, and will be a
117112column of future experiment sizes in ` TOTAL_READS ` , a column of the
118113corresponding expected distinct reads in ` EXPECTED_DISTINCT ` , followed
119114by two columns giving the corresponding confidence intervals.
120115
121- To investigate the past yield of an experiment, use ` c_curve ` . For the
122- most basic usage, use the command:
123- ```
124- preseq c_curve -o estimates.txt input.bed
125- ```
126- If the input file is in .bam format, use the command:
116+ To investigate the past yield of an experiment, use ` c_curve ` .
117+ ` c_curve ` can take in the same file formats as ` lc_extrap ` by using
118+ the same flags. The estimates will appear in estimates.txt with two
119+ columns. The first column gives the total number of reads in a
120+ theoretically smaller experiment and the second gives the corresponding
121+ number of distinct reads.
122+
123+ ` bound_pop ` provides an estimate for the species richness
124+ of the sampled population. The input file formats and corresponding flags
125+ are identical to ` c_curve ` and ` lc_extrap ` . The output provides the median
126+ species richness in the first column and the confidence intervals
127+ in the next two columns.
128+
129+ Finally, ` gc_extrap ` predicts the expected genomic coverage for a future experiment.
130+ It produces the coverage in an output format identical to ` lc_extrap ` . ` gc_extrap `
131+ can only take in files in BED and mapped reads format (using the ` -B ` flag for BED):
132+
127133```
128- preseq c_curve -B -o estimates .txt input.bam
134+ preseq gc_extrap -B -o coverage_estimates .txt SRR1003759_5M_subset.mr
129135```
130- The estimates will appear in estimates.txt with two columns. The
131- first column gives the total number of reads in a theoretically
132- smaller experiment and the second gives the corresponding number of
133- distinct reads.
136+
137+ More data is available in the ` additional_data.txt ` file in the ` data ` directory.
138+ For an extended write-up on our programs, please read the manual in the ` docs `
139+ directory.
140+
141+ UPDATES TO VERSION 3.0.2
142+ ========================================================================
143+ GSL has been completely removed, and a data directory has been added for
144+ users to test our programs.
134145
135146UPDATES TO VERSION 3.0.1
136147========================================================================
0 commit comments