Skip to content

Commit 3429d50

Browse files
authored
Update README.md
1 parent 4eacb71 commit 3429d50

1 file changed

Lines changed: 44 additions & 33 deletions

File tree

README.md

Lines changed: 44 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,7 @@ SYSTEM REQUIREMENTS:
99
========================================================================
1010
The preseq software will only run on 64-bit UNIX-like operating
1111
systems and was developed on both Linux and Mac. The preseq software
12-
requires a C++ compiler that supports C++11. The GNU Scientific
13-
Library (GSL) is **only required** if users would like to use the
14-
`bound_pop` module. It can be installed using `apt` on Linux, using
15-
`brew` on macOS, or from source available
16-
[here](http://www.gnu.org/software/gsl).
12+
requires a C++ compiler that supports C++11.
1713

1814
INSTALLATION:
1915
========================================================================
@@ -49,12 +45,6 @@ you must specify the location like this:
4945
$ ../configure --enable-hts CPPFLAGS='-I /path/to/htslib/headers' \
5046
LDFLAGS='-L/path/to/htslib/lib'
5147
```
52-
We no longer require the GNU Scientific Library (GSL) for all modules
53-
except for `bound_pop`. To use `bound_pop`, please install GSL and
54-
configure with the following flag:
55-
```
56-
$ ../configure --enable-gsl
57-
```
5848
5. Compile and install the tools:
5949
```
6050
$ make
@@ -74,11 +64,7 @@ If the desired input is in `.bam` format, `htslib` is required. Type
7464
```
7565
make HAVE_HTSLIB=1 all
7666
```
77-
To use the `bound_pop` module, type
78-
```
79-
make HAVE_GSL=1 all
80-
```
81-
to make the programs. The HTSLib library can be obtained here:
67+
The HTSLib library can be obtained here:
8268
http://www.htslib.org/download.
8369

8470
INPUT FILE FORMATS:
@@ -103,34 +89,59 @@ USAGE EXAMPLES:
10389
Each program included in this software package will print a list of
10490
options if executed without any command line arguments. Many of the
10591
programs use similar options (for example, output files are specified
106-
with '-o'). To predict the yield of a future experiment, use `lc_extrap`.
107-
For the most basic usage of `lc_extrap` to compute the expected yield,
108-
use the command:
92+
with '-o').
93+
94+
We have provided a data directory to test each of our programs.
95+
Change to the `data` directory and try some of our commands.
96+
To predict the yield of a future experiment, use `lc_extrap`.
97+
For the most basic usage of `lc_extrap` to compute the expected yield,
98+
use the command on the following data:
99+
```
100+
preseq lc_extrap -o yield_estimates.txt SRR1003759_5M_subset.mr
109101
```
110-
preseq lc_extrap -o yield_estimates.txt input.bed
102+
If the input file is in `.bam` format, use the `-B` flag:
111103
```
112-
If the input file is in .bam format, use the command:
104+
preseq lc_extrap -B -o yield_estimates.txt SRR1106616_5M_subset.bam
113105
```
114-
preseq lc_extrap -B -o yield_estimates.txt input.bam
106+
For the counts histogram format, use the '-H' flag:
115107
```
108+
preseq lc_extrap -H -o yield_estimates.txt SRR1301329_1M_read.txt
109+
```
110+
116111
The yield estimates will appear in yield_estimates.txt, and will be a
117112
column of future experiment sizes in `TOTAL_READS`, a column of the
118113
corresponding expected distinct reads in `EXPECTED_DISTINCT`, followed
119114
by two columns giving the corresponding confidence intervals.
120115

121-
To investigate the past yield of an experiment, use `c_curve`. For the
122-
most basic usage, use the command:
123-
```
124-
preseq c_curve -o estimates.txt input.bed
125-
```
126-
If the input file is in .bam format, use the command:
116+
To investigate the past yield of an experiment, use `c_curve`.
117+
`c_curve` can take in the same file formats as `lc_extrap` by using
118+
the same flags. The estimates will appear in estimates.txt with two
119+
columns. The first column gives the total number of reads in a
120+
theoretically smaller experiment and the second gives the corresponding
121+
number of distinct reads.
122+
123+
`bound_pop` provides an estimate for the species richness
124+
of the sampled population. The input file formats and corresponding flags
125+
are identical to `c_curve` and `lc_extrap`. The output provides the median
126+
species richness in the first column and the confidence intervals
127+
in the next two columns.
128+
129+
Finally, `gc_extrap` predicts the expected genomic coverage for a future experiment.
130+
It produces the coverage in an output format identical to `lc_extrap`. `gc_extrap`
131+
can only take in files in BED and mapped reads format (using the `-B` flag for BED):
132+
127133
```
128-
preseq c_curve -B -o estimates.txt input.bam
134+
preseq gc_extrap -B -o coverage_estimates.txt SRR1003759_5M_subset.mr
129135
```
130-
The estimates will appear in estimates.txt with two columns. The
131-
first column gives the total number of reads in a theoretically
132-
smaller experiment and the second gives the corresponding number of
133-
distinct reads.
136+
137+
More data is available in the `additional_data.txt` file in the `data` directory.
138+
For an extended write-up on our programs, please read the manual in the `docs`
139+
directory.
140+
141+
UPDATES TO VERSION 3.0.2
142+
========================================================================
143+
GSL has been completely removed, and a data directory has been added for
144+
users to test our programs.
134145

135146
UPDATES TO VERSION 3.0.1
136147
========================================================================

0 commit comments

Comments
 (0)