Skip to content
This repository was archived by the owner on Jan 7, 2021. It is now read-only.

Commit 7f3eec5

Browse files
committed
Added documenation for the new file_hash attributing to fix #91
1 parent db9509c commit 7f3eec5

12 files changed

Lines changed: 134 additions & 5 deletions
-18.2 KB
Binary file not shown.
-998 Bytes
Binary file not shown.
-5 Bytes
Binary file not shown.

docs/_build/html/_sources/documents.txt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,20 @@ Metadata
130130
>>> obj.entities
131131
[<Entity: Angeles>, <Entity: FD>, <Entity: OO>, <Entity: Los Angeles>, ...
132132

133+
.. attribute:: document_obj.file_hash
134+
135+
A hash representation of the raw PDF data as a hexadecimal string.
136+
137+
>>> obj = client.documents.get('1021571-lafd-2013-hiring-statistics')
138+
>>> obj.file_hash
139+
'872b9b858f5f3e6bb6086fec7f05dd464b60eb26'
140+
141+
You could recreate this hexadecimal hash yourself using the `SHA-1 algorithm <https://en.wikipedia.org/wiki/SHA-1>`_.
142+
143+
>>> import hashlib
144+
>>> hashlib.sha1(obj.pdf).hexdigest()
145+
'872b9b858f5f3e6bb6086fec7f05dd464b60eb26'
146+
133147
.. attribute:: document_obj.full_text
134148

135149
Returns the full text of the document, as extracted from the original PDF by DocumentCloud. Results may vary, but this will give you what they got. Currently, DocumentCloud only makes this available for public documents.

docs/_build/html/documents.html

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@
2323
<script type="text/javascript" src="_static/jquery.js"></script>
2424
<script type="text/javascript" src="_static/underscore.js"></script>
2525
<script type="text/javascript" src="_static/doctools.js"></script>
26-
<link rel="top" title="python-documentcloud 0.2 documentation" href="index.html" />
26+
<link rel="top" title="python-documentcloud 0.2 documentation" href="index.html" />
27+
<link rel="next" title="Projects" href="projects.html" />
28+
<link rel="prev" title="Getting started" href="gettingstarted.html" />
2729
</head>
2830
<body>
2931
<div class="related">
@@ -32,6 +34,12 @@ <h3>Navigation</h3>
3234
<li class="right" style="margin-right: 10px">
3335
<a href="genindex.html" title="General Index"
3436
accesskey="I">index</a></li>
37+
<li class="right" >
38+
<a href="projects.html" title="Projects"
39+
accesskey="N">next</a> |</li>
40+
<li class="right" >
41+
<a href="gettingstarted.html" title="Getting started"
42+
accesskey="P">previous</a> |</li>
3543
<li><a href="index.html">python-documentcloud 0.2 documentation</a> &raquo;</li>
3644
</ul>
3745
</div>
@@ -209,6 +217,23 @@ <h2>Metadata<a class="headerlink" href="#metadata" title="Permalink to this head
209217
</div>
210218
</dd></dl>
211219

220+
<dl class="attribute">
221+
<dt id="document_obj.file_hash">
222+
<tt class="descclassname">document_obj.</tt><tt class="descname">file_hash</tt><a class="headerlink" href="#document_obj.file_hash" title="Permalink to this definition"></a></dt>
223+
<dd><p>A hash representation of the raw PDF data as a hexadecimal string.</p>
224+
<div class="highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">obj</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">documents</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">&#39;1021571-lafd-2013-hiring-statistics&#39;</span><span class="p">)</span>
225+
<span class="gp">&gt;&gt;&gt; </span><span class="n">obj</span><span class="o">.</span><span class="n">file_hash</span>
226+
<span class="go">&#39;872b9b858f5f3e6bb6086fec7f05dd464b60eb26&#39;</span>
227+
</pre></div>
228+
</div>
229+
<p>You could recreate this hexadecimal hash yourself using the <a class="reference external" href="https://en.wikipedia.org/wiki/SHA-1">SHA-1 algorithm</a>.</p>
230+
<div class="highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">hashlib</span>
231+
<span class="gp">&gt;&gt;&gt; </span><span class="n">hashlib</span><span class="o">.</span><span class="n">sha1</span><span class="p">(</span><span class="n">obj</span><span class="o">.</span><span class="n">pdf</span><span class="p">)</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">()</span>
232+
<span class="go">&#39;872b9b858f5f3e6bb6086fec7f05dd464b60eb26&#39;</span>
233+
</pre></div>
234+
</div>
235+
</dd></dl>
236+
212237
<dl class="attribute">
213238
<dt id="document_obj.full_text">
214239
<tt class="descclassname">document_obj.</tt><tt class="descname">full_text</tt><a class="headerlink" href="#document_obj.full_text" title="Permalink to this definition"></a></dt>
@@ -395,6 +420,12 @@ <h3><a href="index.html">Table Of Contents</a></h3>
395420
</li>
396421
</ul>
397422

423+
<h4>Previous topic</h4>
424+
<p class="topless"><a href="gettingstarted.html"
425+
title="previous chapter">Getting started</a></p>
426+
<h4>Next topic</h4>
427+
<p class="topless"><a href="projects.html"
428+
title="next chapter">Projects</a></p>
398429
<h3>This Page</h3>
399430
<ul class="this-page-menu">
400431
<li><a href="_sources/documents.txt"
@@ -423,6 +454,12 @@ <h3>Navigation</h3>
423454
<li class="right" style="margin-right: 10px">
424455
<a href="genindex.html" title="General Index"
425456
>index</a></li>
457+
<li class="right" >
458+
<a href="projects.html" title="Projects"
459+
>next</a> |</li>
460+
<li class="right" >
461+
<a href="gettingstarted.html" title="Getting started"
462+
>previous</a> |</li>
426463
<li><a href="index.html">python-documentcloud 0.2 documentation</a> &raquo;</li>
427464
</ul>
428465
</div>

docs/_build/html/genindex.html

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,10 @@ <h2 id="F">F</h2>
215215
<table style="width: 100%" class="indextable genindextable"><tr>
216216
<td style="width: 33%" valign="top"><dl>
217217

218+
<dt><a href="documents.html#document_obj.file_hash">file_hash (document_obj attribute)</a>
219+
</dt>
220+
221+
218222
<dt><a href="documents.html#document_obj.full_text">full_text (document_obj attribute)</a>
219223
</dt>
220224

docs/_build/html/gettingstarted.html

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@
2323
<script type="text/javascript" src="_static/jquery.js"></script>
2424
<script type="text/javascript" src="_static/underscore.js"></script>
2525
<script type="text/javascript" src="_static/doctools.js"></script>
26-
<link rel="top" title="python-documentcloud 0.2 documentation" href="index.html" />
26+
<link rel="top" title="python-documentcloud 0.2 documentation" href="index.html" />
27+
<link rel="next" title="Documents" href="documents.html" />
28+
<link rel="prev" title="python-documentcloud" href="index.html" />
2729
</head>
2830
<body>
2931
<div class="related">
@@ -32,6 +34,12 @@ <h3>Navigation</h3>
3234
<li class="right" style="margin-right: 10px">
3335
<a href="genindex.html" title="General Index"
3436
accesskey="I">index</a></li>
37+
<li class="right" >
38+
<a href="documents.html" title="Documents"
39+
accesskey="N">next</a> |</li>
40+
<li class="right" >
41+
<a href="index.html" title="python-documentcloud"
42+
accesskey="P">previous</a> |</li>
3543
<li><a href="index.html">python-documentcloud 0.2 documentation</a> &raquo;</li>
3644
</ul>
3745
</div>
@@ -180,6 +188,12 @@ <h3><a href="index.html">Table Of Contents</a></h3>
180188
</li>
181189
</ul>
182190

191+
<h4>Previous topic</h4>
192+
<p class="topless"><a href="index.html"
193+
title="previous chapter">python-documentcloud</a></p>
194+
<h4>Next topic</h4>
195+
<p class="topless"><a href="documents.html"
196+
title="next chapter">Documents</a></p>
183197
<h3>This Page</h3>
184198
<ul class="this-page-menu">
185199
<li><a href="_sources/gettingstarted.txt"
@@ -208,6 +222,12 @@ <h3>Navigation</h3>
208222
<li class="right" style="margin-right: 10px">
209223
<a href="genindex.html" title="General Index"
210224
>index</a></li>
225+
<li class="right" >
226+
<a href="documents.html" title="Documents"
227+
>next</a> |</li>
228+
<li class="right" >
229+
<a href="index.html" title="python-documentcloud"
230+
>previous</a> |</li>
211231
<li><a href="index.html">python-documentcloud 0.2 documentation</a> &raquo;</li>
212232
</ul>
213233
</div>

docs/_build/html/objects.inv

26 Bytes
Binary file not shown.

docs/_build/html/otherobjects.html

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@
2323
<script type="text/javascript" src="_static/jquery.js"></script>
2424
<script type="text/javascript" src="_static/underscore.js"></script>
2525
<script type="text/javascript" src="_static/doctools.js"></script>
26-
<link rel="top" title="python-documentcloud 0.2 documentation" href="index.html" />
26+
<link rel="top" title="python-documentcloud 0.2 documentation" href="index.html" />
27+
<link rel="next" title="Changelog" href="changelog.html" />
28+
<link rel="prev" title="Projects" href="projects.html" />
2729
</head>
2830
<body>
2931
<div class="related">
@@ -32,6 +34,12 @@ <h3>Navigation</h3>
3234
<li class="right" style="margin-right: 10px">
3335
<a href="genindex.html" title="General Index"
3436
accesskey="I">index</a></li>
37+
<li class="right" >
38+
<a href="changelog.html" title="Changelog"
39+
accesskey="N">next</a> |</li>
40+
<li class="right" >
41+
<a href="projects.html" title="Projects"
42+
accesskey="P">previous</a> |</li>
3543
<li><a href="index.html">python-documentcloud 0.2 documentation</a> &raquo;</li>
3644
</ul>
3745
</div>
@@ -186,6 +194,12 @@ <h3><a href="index.html">Table Of Contents</a></h3>
186194
</li>
187195
</ul>
188196

197+
<h4>Previous topic</h4>
198+
<p class="topless"><a href="projects.html"
199+
title="previous chapter">Projects</a></p>
200+
<h4>Next topic</h4>
201+
<p class="topless"><a href="changelog.html"
202+
title="next chapter">Changelog</a></p>
189203
<h3>This Page</h3>
190204
<ul class="this-page-menu">
191205
<li><a href="_sources/otherobjects.txt"
@@ -214,6 +228,12 @@ <h3>Navigation</h3>
214228
<li class="right" style="margin-right: 10px">
215229
<a href="genindex.html" title="General Index"
216230
>index</a></li>
231+
<li class="right" >
232+
<a href="changelog.html" title="Changelog"
233+
>next</a> |</li>
234+
<li class="right" >
235+
<a href="projects.html" title="Projects"
236+
>previous</a> |</li>
217237
<li><a href="index.html">python-documentcloud 0.2 documentation</a> &raquo;</li>
218238
</ul>
219239
</div>

docs/_build/html/projects.html

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@
2323
<script type="text/javascript" src="_static/jquery.js"></script>
2424
<script type="text/javascript" src="_static/underscore.js"></script>
2525
<script type="text/javascript" src="_static/doctools.js"></script>
26-
<link rel="top" title="python-documentcloud 0.2 documentation" href="index.html" />
26+
<link rel="top" title="python-documentcloud 0.2 documentation" href="index.html" />
27+
<link rel="next" title="Other objects" href="otherobjects.html" />
28+
<link rel="prev" title="Documents" href="documents.html" />
2729
</head>
2830
<body>
2931
<div class="related">
@@ -32,6 +34,12 @@ <h3>Navigation</h3>
3234
<li class="right" style="margin-right: 10px">
3335
<a href="genindex.html" title="General Index"
3436
accesskey="I">index</a></li>
37+
<li class="right" >
38+
<a href="otherobjects.html" title="Other objects"
39+
accesskey="N">next</a> |</li>
40+
<li class="right" >
41+
<a href="documents.html" title="Documents"
42+
accesskey="P">previous</a> |</li>
3543
<li><a href="index.html">python-documentcloud 0.2 documentation</a> &raquo;</li>
3644
</ul>
3745
</div>
@@ -222,6 +230,12 @@ <h3><a href="index.html">Table Of Contents</a></h3>
222230
</li>
223231
</ul>
224232

233+
<h4>Previous topic</h4>
234+
<p class="topless"><a href="documents.html"
235+
title="previous chapter">Documents</a></p>
236+
<h4>Next topic</h4>
237+
<p class="topless"><a href="otherobjects.html"
238+
title="next chapter">Other objects</a></p>
225239
<h3>This Page</h3>
226240
<ul class="this-page-menu">
227241
<li><a href="_sources/projects.txt"
@@ -250,6 +264,12 @@ <h3>Navigation</h3>
250264
<li class="right" style="margin-right: 10px">
251265
<a href="genindex.html" title="General Index"
252266
>index</a></li>
267+
<li class="right" >
268+
<a href="otherobjects.html" title="Other objects"
269+
>next</a> |</li>
270+
<li class="right" >
271+
<a href="documents.html" title="Documents"
272+
>previous</a> |</li>
253273
<li><a href="index.html">python-documentcloud 0.2 documentation</a> &raquo;</li>
254274
</ul>
255275
</div>

0 commit comments

Comments
 (0)