Skip to content
This repository was archived by the owner on Jan 7, 2021. It is now read-only.

Commit 9660f9f

Browse files
committed
Drafted documentation for the previously undocumented get_page_text feature being fixed in #94. [skip ci]
1 parent e852deb commit 9660f9f

8 files changed

Lines changed: 39 additions & 2 deletions

File tree

1.86 KB
Binary file not shown.
475 Bytes
Binary file not shown.

docs/_build/html/_sources/documents.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,16 @@ Metadata
156156

157157
Returns the URL that contains the full text of the document, as extracted from the original PDF by DocumentCloud.
158158

159+
.. method:: document_obj.get_page_text(page)
160+
161+
Submit a page number and receive the raw text extracted from it by DocumentCloud.
162+
163+
>>> obj = client.documents.get('1088501-adventuretime-alta')
164+
>>> txt = obj.get_page_text(1)
165+
# Let's print just the first line
166+
>>> print txt.decode().split("\n")[0]
167+
STATE OF CALIFORNIA- HEALTH AND HUMAN SERVICES AGENCY
168+
159169
.. attribute:: document_obj.id
160170

161171
The unique identifer of the document in DocumentCloud's system. Typically this is a string that begins with a number, like ``83251-fbi-file-on-christopher-biggie-s.malls-wallace``

docs/_build/html/documents.html

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,19 @@ <h2>Metadata<a class="headerlink" href="#metadata" title="Permalink to this head
251251
<dd><p>Returns the URL that contains the full text of the document, as extracted from the original PDF by DocumentCloud.</p>
252252
</dd></dl>
253253

254+
<dl class="method">
255+
<dt id="document_obj.get_page_text">
256+
<tt class="descclassname">document_obj.</tt><tt class="descname">get_page_text</tt><big>(</big><em>page</em><big>)</big><a class="headerlink" href="#document_obj.get_page_text" title="Permalink to this definition"></a></dt>
257+
<dd><p>Submit a page number and receive the raw text extracted from it by DocumentCloud.</p>
258+
<div class="highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">obj</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">documents</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">&#39;1088501-adventuretime-alta&#39;</span><span class="p">)</span>
259+
<span class="gp">&gt;&gt;&gt; </span><span class="n">txt</span> <span class="o">=</span> <span class="n">obj</span><span class="o">.</span><span class="n">get_page_text</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
260+
<span class="go"># Let&#39;s print just the first line</span>
261+
<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span> <span class="n">txt</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
262+
<span class="go">STATE OF CALIFORNIA- HEALTH AND HUMAN SERVICES AGENCY</span>
263+
</pre></div>
264+
</div>
265+
</dd></dl>
266+
254267
<dl class="attribute">
255268
<dt id="document_obj.id">
256269
<tt class="descclassname">document_obj.</tt><tt class="descname">id</tt><a class="headerlink" href="#document_obj.id" title="Permalink to this definition"></a></dt>

docs/_build/html/genindex.html

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -238,10 +238,14 @@ <h2 id="G">G</h2>
238238
<dt><a href="projects.html#project_obj.get_document">get_document() (project_obj method)</a>
239239
</dt>
240240

241+
242+
<dt><a href="projects.html#client.projects.get_or_create_by_title">get_or_create_by_title() (client.projects method)</a>
243+
</dt>
244+
241245
</dl></td>
242246
<td style="width: 33%" valign="top"><dl>
243247

244-
<dt><a href="projects.html#client.projects.get_or_create_by_title">get_or_create_by_title() (client.projects method)</a>
248+
<dt><a href="documents.html#document_obj.get_page_text">get_page_text() (document_obj method)</a>
245249
</dt>
246250

247251
</dl></td>

docs/_build/html/objects.inv

30 Bytes
Binary file not shown.

docs/_build/html/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/documents.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,16 @@ Metadata
156156

157157
Returns the URL that contains the full text of the document, as extracted from the original PDF by DocumentCloud.
158158

159+
.. method:: document_obj.get_page_text(page)
160+
161+
Submit a page number and receive the raw text extracted from it by DocumentCloud.
162+
163+
>>> obj = client.documents.get('1088501-adventuretime-alta')
164+
>>> txt = obj.get_page_text(1)
165+
# Let's print just the first line
166+
>>> print txt.decode().split("\n")[0]
167+
STATE OF CALIFORNIA- HEALTH AND HUMAN SERVICES AGENCY
168+
159169
.. attribute:: document_obj.id
160170

161171
The unique identifer of the document in DocumentCloud's system. Typically this is a string that begins with a number, like ``83251-fbi-file-on-christopher-biggie-s.malls-wallace``

0 commit comments

Comments
 (0)