Discussion:
[liberationtech] data mine the snowden files [was: open the snowden files]
coderman
2014-07-08 19:08:19 UTC
Permalink
...
the snowden files are of public interest. but only a small circle of
people is able to access, read, analyze, interpret and publish them. and
only a very small percentage of those files has been made available to
the public...
what can be done about this situation? are we able to find a way to
"open" this data? and in the course of this create a modell for future
leaks?
..
prior to my intervention harding had already hinted at some very obvious
limitations of the ongoing investigation, alluding to various reasons
why those "few lucky ones" are incapable to deal with the investigation
challenge in an approriate manner: "we are not technical experts" or
"after two hours your eyes pop out". inspite of this, harding seemed
unprepared to refelect the possibility to open the small circle of
analysts dealing with the snowden files.
an impasse of extremes, a full or limited dump off the table.

let's find a middle ground. how best to proceed?
* last but not least: one should work out a concept/model for
transferring those files into the public domain -- taking also into
account the obvious problems of "security" and "government pressure".
it would be great of we could start a debate about in order to build a
case for the future of handling big data leaks in a more democratic and
sustainable manner.
very great indeed. what kind of tools would make the journalists
involved more effective and productive?

1. using the leaks currently published, devise a framework for "data
mining" the leak documents, aka, generating metadata from the data and
operating various matches and relevance across the metadata to narrow
the search and aggregate related efforts or technologies across their
compartmentalized worlds.

2. #1 requires that there is an index of special terms, techniques,
suppliers, code names, algorithms, etc. that used to generate the
metadata for deeper search and tie to general themes of surveillance.

3. extrapolating from current leaks, also look toward recent
advancements and specific technical tell tales of interest. doping
silicon as tailored access technique? this could refer to compromised
runs of security processors for desired targets. etc.

4. justifying technical detail specifically. we have seen so little
technical detail of the source code / hardware design level. how best
to justify source code - explaining that the language choice, the
nature of the algorithms, the structure of the distributed computing
upon which it runs all conveys critical technical details important to
understand what part of our technologies are compromised, and guiding
the fixes required to protect against such compromises?



in short, it would behoove us to build tools to make the journalists
more effective, rather than bitch about not being included in the
inner circle. (sadly, many good knowledge discovery tools are
proprietary and applied to open source intelligence)


what types of features would you want such a leak-assistant software
to have? what types of existing tools, if any, would provide these
capabilities?


best regards,
Griffin Boyce
2014-07-08 20:05:36 UTC
Permalink
One approach is to take the existing public data, make some assumptions (educated guesses) and do additional research on top of that. It's what I'm doing right now. It's also what led to the original cointelpro revelations. Before the follow-up research, it was a meaningless acronym.

Find, extrapolate, expand.

~ Griffin
--
Sent from my tracking device. Please excuse brevity and cat photos.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20140708/05ab8de8/attachment.html>
coderman
2014-07-08 20:11:44 UTC
Permalink
Post by Griffin Boyce
One approach is to take the existing public data, make some assumptions
(educated guesses) and do additional research on top of that. It's what I'm
doing right now. It's also what led to the original cointelpro revelations.
Before the follow-up research, it was a meaningless acronym.
Find, extrapolate, expand.
hi Griffin!

this is the type of effort i was hoping to see undertaken.

when you say "additional research", is this organic or structured?
tool assisted or old skewl?

i too have been building up some terms and technologies, but yet to
put it into any structured format with context, as part of my post is
to see how others are handling the vast complexity and extensive
compartmentalization embodied in the leaks to date.

i also would like to pursue this research anonymously, on hidden
services rather than public sites or email.


best regards,
Griffin Boyce
2014-07-08 22:11:39 UTC
Permalink
Post by coderman
hi Griffin!
this is the type of effort i was hoping to see undertaken.
Me too ^_^ eventually I realized I'd have to do it myself if I wanted more info on Topic X. I obviously don't have access to the source, but there are some clear ways to expand on the material that's been released.
Post by coderman
when you say "additional research", is this organic or structured?
tool assisted or old skewl?
Right now, the aspect I'm researching requires lots of structured research, but fully expect to come across something unexpected (a specific sourcing pattern, perhaps).

Manual desk research is the new hotness. Well... maybe not. ;) It helps that I'm really good at it, so it doesn't take as much drudgery. Once collected, some things are trimmed and cleaned up using custom tools. But data collection is all manual.
Post by coderman
i too have been building up some terms and technologies, but yet to
put it into any structured format with context, as part of my post is
to see how others are handling the vast complexity and extensive
compartmentalization embodied in the leaks to date.
Nice! :D I'd love to hear more about your conclusions sometime.

I started by looking at one narrow outcome of the NSA's work that I find horribly disruptive to the ecosystem around my work. Now my task is to find further proof of this activity using unclassified source material and possibly patterns within their work in this area.
Post by coderman
i also would like to pursue this research anonymously, on hidden
services rather than public sites or email.
Indeed. Lots of excellent reasons to be light on detail in these types of public forums.

~ Griffin
--
Sent from my tracking device. Please excuse brevity and cat photos.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20140708/43338c24/attachment.html>
grarpamp
2014-07-08 22:27:18 UTC
Permalink
Post by coderman
Post by Griffin Boyce
One approach is to take the existing public data, make some assumptions
(educated guesses) and do additional research on top of that. It's what I'm
doing right now. It's also what led to the original cointelpro revelations.
Before the follow-up research, it was a meaningless acronym.
Find, extrapolate, expand.
this is the type of effort i was hoping to see undertaken.
when you say "additional research", is this organic or structured?
tool assisted or old skewl?
i too have been building up some terms and technologies, but yet to
put it into any structured format with context, as part of my post is
to see how others are handling the vast complexity and extensive
compartmentalization embodied in the leaks to date.
i also would like to pursue this research anonymously, on hidden
services rather than public sites or email.
To do any of this you will need to collect all the releases of docs
and images to date, in their original format (not AP newsspeak),
in one place. Then dedicate much time to normalizing, convert to
one format and import into tagged document store, etc. Yes, this
could be hosted on the darknet.
coderman
2014-07-09 14:04:06 UTC
Permalink
Post by grarpamp
...
To do any of this you will need to collect all the releases of docs
and images to date, in their original format (not AP newsspeak),
in one place. Then dedicate much time to normalizing, convert to
one format and import into tagged document store, etc. Yes, this
could be hosted on the darknet.
indeed. i will also be hosting the complete cryptome archive on hidden
site, as it too is part of this corpus to feed into a normalization
and extraction engine of great justice. i am using the various python
image processing libraries to accomplish this but any language or tool
could be useful.

i had hoped to distribute the cryptome archives further during the
Paris hackfest, alas, unexpected events conspired otherwise.

anyone who would like to host mirrors is welcome to tell me how they
anticipate mirroring ~30G of data as quickly as possible. :)
edhelas
2014-07-09 14:58:05 UTC
Permalink
What about a Torrent ? We can easily share the magnet everywhere
(Reddit, Twitter?).
Post by coderman
Post by grarpamp
...
To do any of this you will need to collect all the releases of docs
and images to date, in their original format (not AP newsspeak),
in one place. Then dedicate much time to normalizing, convert to
one format and import into tagged document store, etc. Yes, this
could be hosted on the darknet.
indeed. i will also be hosting the complete cryptome archive on hidden
site, as it too is part of this corpus to feed into a normalization
and extraction engine of great justice. i am using the various python
image processing libraries to accomplish this but any language or tool
could be useful.
i had hoped to distribute the cryptome archives further during the
Paris hackfest, alas, unexpected events conspired otherwise.
anyone who would like to host mirrors is welcome to tell me how they
anticipate mirroring ~30G of data as quickly as possible. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20140709/2136fbbc/attachment.html>
Eugen Leitl
2014-07-09 15:16:54 UTC
Permalink
Post by edhelas
What about a Torrent ? We can easily share the magnet everywhere
(Reddit, Twitter?).
Great idea. Don't forget to publish the magnet on TPB as well.
Nick
2014-07-09 15:37:25 UTC
Permalink
Post by edhelas
What about a Torrent ? We can easily share the magnet everywhere
Note that there is a torrent of the cryptome archive up to 2011:
magnet:?xt=urn:btih:ba401110a60ad844a09d4219e5f95a46385f7410

But yes, bittorrent seems like a reasonable way to distribute this
sort of stuff. That said, it is not anonymous, so a hidden service
as the originating place seems sensible.
Griffin Boyce
2014-07-09 21:46:42 UTC
Permalink
Post by Nick
Post by edhelas
What about a Torrent ? We can easily share the magnet everywhere
magnet:?xt=urn:btih:ba401110a60ad844a09d4219e5f95a46385f7410
But yes, bittorrent seems like a reasonable way to distribute this
sort of stuff. That said, it is not anonymous, so a hidden service
as the originating place seems sensible.
Also keep in mind that it's possible to spy on who downloads these
just by seeding the torrent and monitoring connections to your box. So
it's certainly not anonymous. I'd say hidden service first, a website
second, and torrent third.

~Griffin
--
Wherever truth, love and laughter abide, I am there in spirit.
-Bill Hicks
Natanael
2014-07-09 16:18:50 UTC
Permalink
FYI, anonymous torrenting is possible over I2P. While slower than regular
torrenting, it works fine and don't need any high capacity servers.

- Sent from my tablet
Post by Griffin Boyce
Post by Nick
Post by edhelas
What about a Torrent ? We can easily share the magnet everywhere
magnet:?xt=urn:btih:ba401110a60ad844a09d4219e5f95a46385f7410
But yes, bittorrent seems like a reasonable way to distribute this
sort of stuff. That said, it is not anonymous, so a hidden service
as the originating place seems sensible.
Also keep in mind that it's possible to spy on who downloads these
just by seeding the torrent and monitoring connections to your box. So
it's certainly not anonymous. I'd say hidden service first, a website
second, and torrent third.
~Griffin
--
Wherever truth, love and laughter abide, I am there in spirit.
-Bill Hicks
--
Liberationtech is public & archives are searchable on Google. Violations
https://mailman.stanford.edu/mailman/listinfo/liberationtech.
Unsubscribe, change to digest, or change password by emailing moderator at
companys at stanford.edu.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20140709/99ba466e/attachment.html>
Aymeric Vitte
2014-07-09 17:29:20 UTC
Permalink
That's the purpose of [1] and [2] to solve these issues, the files could
be splitted in reasonable sizes and put inside initial anonymous
seeder's browser or Peersm client, then downloaded and shared
anonymously by others using their hash_names (kind of infohash)

But not in the current phase for sensitive documents since right now
it's not a pure p2p

[1] http://www.peersm.com
[2]
https://github.com/Ayms/node-Tor#anonymous-serverless-p2p-inside-browsers---peersm-specs
Post by Griffin Boyce
Post by Nick
Post by edhelas
What about a Torrent ? We can easily share the magnet everywhere
magnet:?xt=urn:btih:ba401110a60ad844a09d4219e5f95a46385f7410
But yes, bittorrent seems like a reasonable way to distribute this
sort of stuff. That said, it is not anonymous, so a hidden service
as the originating place seems sensible.
Also keep in mind that it's possible to spy on who downloads these
just by seeding the torrent and monitoring connections to your box. So
it's certainly not anonymous. I'd say hidden service first, a website
second, and torrent third.
~Griffin
--
Peersm : http://www.peersm.com
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
coderman
2014-07-10 01:35:50 UTC
Permalink
Tag the Cryptome Archive: "This is a trap, witting and unwitting. Do not use
it or use at own risk. Source and this host is out to pwon and phuck you in
complicity with global Internet authorities. Signed Batshit Cryptome and
Host, 9 July 2014, 12:16ET."
see attached. onion before torrent; rest TBD.
also: http://cryptome.org/donations.htm


best regards,
-------------- next part --------------

Cryptome Donation Required
- http://cryptome.org/donations.htm

"This is a trap, witting and unwitting.
Do not use it or use at own risk.
Source and this host is out to pwon and phuck you in complicity
with global Internet authorities.

Signed Batshit Cryptome and Host,
9 July 2014, 12:16ET."

Index:
0eb8551d977dde4f4193b3a16dedcd18f01e854e371e96623d33dd5b9519e413 *USB-1.rar
9653d105293b9f77d5b0067d51a35ed286a7f50a0b37b3ea2bd78c092caab584 *USB-2.rar
7e798bb2b09cac49181aa7c12170e03fc3d3cf69a73d9e1b04171c80910e7525 Update-13-1231.rar
80652978f46ef6e6f26bd2bec406349ef766ad1722fc81d9f7575148edc6324f wikileaks-bank-julius-baer.zip
c56f0fd30924f7398ca9e20c098acced50766d3325754f29014dd33029ebf351 wikileaks-safekeep-to-08-0210.zip
9d2aa03048c60eec2c94d45293d4e95977a94f3477a4701f6ee2ef7ec888a7c9 WikiLeaks-State-Dept-Cables-xyz.zip
*- these files have a detached signature from key
0xB650572B8B3BF75C "Cryptome <cryptome at earthlink.net>"

-------------- next part --------------
coderman
2014-07-11 00:29:42 UTC
Permalink
-------------- next part --------------

Cryptome Donation Required
- http://cryptome.org/donations.htm

Donation also provides current archive as this selection is not current,
and increasingly out of date by the day.

-

"This is a trap, witting and unwitting.
Do not use it or use at own risk.
Source and this host is out to pwon and phuck you in complicity
with global Internet authorities.

Signed Batshit Cryptome and Host,
9 July 2014, 12:16ET."

-

Index:
0eb8551d977dde4f4193b3a16dedcd18f01e854e371e96623d33dd5b9519e413 *USB-1.rar
9653d105293b9f77d5b0067d51a35ed286a7f50a0b37b3ea2bd78c092caab584 *USB-2.rar
7e798bb2b09cac49181aa7c12170e03fc3d3cf69a73d9e1b04171c80910e7525 Update-13-1231.rar
b63e185c21232724f9c90238496b9122a46d492752d56f690200fab6fe9fb6ed Update-14-0206-0602.tar.rar
6e5146b4c53f61b555822eda90e70a20a8050fe3dbf0bd3a084a042a36bdd3b1 Cryptome-Update-13-0701-to-13-1202.tgz
80652978f46ef6e6f26bd2bec406349ef766ad1722fc81d9f7575148edc6324f wikileaks-bank-julius-baer.zip
c56f0fd30924f7398ca9e20c098acced50766d3325754f29014dd33029ebf351 wikileaks-safekeep-to-08-0210.zip
9d2aa03048c60eec2c94d45293d4e95977a94f3477a4701f6ee2ef7ec888a7c9 WikiLeaks-State-Dept-Cables-xyz.zip
*- these files have a detached signature by presumed key
0xB650572B8B3BF75C "Cryptome <cryptome at earthlink.net>"

-

Recommended Usage:
# Requires Tor running and http proxy to Tor at 127.0.0.1:8888
export onions="sek42kxkbjuivxws.onion ajzxwgtrtws7zwyg.onion wpv2bxujoctsmzcn.onion aiyu6uyckomxt2ld.onion kvrvzxgdutjcjxqw.onion hz5sj76rh3avsmfc.onion jt7klzczup6hrtes.onion 3qcs4cqbsrfdz7xa.onion"
export files="Update-13-1231.rar Update-14-0206-0602.tar.rar USB-1.rar USB-2.rar wikileaks-bank-julius-baer.zip wikileaks-safekeep-to-08-0210.zip WikiLeaks-State-Dept-Cables-xyz.zip Cryptome-Update-13-0701-to-13-1202.tgz"
for cfile in `echo $files`; do
export olist=""
for chost in `echo $onions`; do
export olist="${olist} http://${chost}/cryptome-july2014/${cfile}"
done
echo "Retrieving $cfile ..."
aria2c \
--all-proxy=127.0.0.1:8123 \
--continue=true --always-resume=true \
--retry-wait=30 --timeout=120 \
--summary-interval=3 \
--max-connection-per-server=2 --max-concurrent-downloads=8 \
-o "$cfile" `echo $olist`
done

-------------- next part --------------
coderman
2014-07-12 05:19:20 UTC
Permalink
added example privoxy config as http_proxy to Tor, add sig note for Update 13.
no further updates on list; contact direct if issues encountered.

best regards,
-------------- next part --------------

Cryptome Donation Required
- http://cryptome.org/donations.htm

Donation also provides current archive as this selection is not current,
and increasingly out of date by the day.

-

"This is a trap, witting and unwitting.
Do not use it or use at own risk.
Source and this host is out to pwon and phuck you in complicity
with global Internet authorities.

Signed Batshit Cryptome and Host,
9 July 2014, 12:16ET."
- https://cpunks.org//pipermail/cypherpunks/2014-July/005020.html

-

Index:
0eb8551d977dde4f4193b3a16dedcd18f01e854e371e96623d33dd5b9519e413 *USB-1.rar
9653d105293b9f77d5b0067d51a35ed286a7f50a0b37b3ea2bd78c092caab584 *USB-2.rar
7e798bb2b09cac49181aa7c12170e03fc3d3cf69a73d9e1b04171c80910e7525 *Update-13-1231.rar
b63e185c21232724f9c90238496b9122a46d492752d56f690200fab6fe9fb6ed Update-14-0206-0602.tar.rar
6e5146b4c53f61b555822eda90e70a20a8050fe3dbf0bd3a084a042a36bdd3b1 Cryptome-Update-13-0701-to-13-1202.tgz
80652978f46ef6e6f26bd2bec406349ef766ad1722fc81d9f7575148edc6324f wikileaks-bank-julius-baer.zip
c56f0fd30924f7398ca9e20c098acced50766d3325754f29014dd33029ebf351 wikileaks-safekeep-to-08-0210.zip
9d2aa03048c60eec2c94d45293d4e95977a94f3477a4701f6ee2ef7ec888a7c9 WikiLeaks-State-Dept-Cables-xyz.zip
*- these files have a detached signature by presumed key
0xB650572B8B3BF75C "Cryptome <cryptome at earthlink.net>"
append '.sig' for signature files.

-

Recommended usage:
# apt-get install privoxy tor
nano /etc/privoxy/config
--- begin-cut /etc/privoxy/config ---
# Tor Privoxy configuration
# NOTE: toggle=0 disables all privacy rewrite protections
toggle 0
confdir /etc/privoxy
logdir /var/log/privoxy
logfile logfile
hostname hostname.example.org
listen-address 127.0.0.1:8118
enable-remote-toggle 0
enable-remote-http-toggle 0
enable-edit-actions 0
enforce-blocks 0
forwarded-connect-retries 0
accept-intercepted-requests 0
allow-cgi-request-crunching 0
split-large-forms 0
keep-alive-timeout 5
socket-timeout 300
max-client-connections 256
#
# for Tor browser bundle
#forward-socks5 / 127.0.0.1:9150 .
# for Tor upstream
forward-socks5 / 127.0.0.1:9050 .
--- end-cut ---

Aria2 download:
# Requires Tor running and http proxy to Tor at 127.0.0.1:8118
export onions="sek42kxkbjuivxws.onion ajzxwgtrtws7zwyg.onion wpv2bxujoctsmzcn.onion aiyu6uyckomxt2ld.onion kvrvzxgdutjcjxqw.onion hz5sj76rh3avsmfc.onion jt7klzczup6hrtes.onion 3qcs4cqbsrfdz7xa.onion"
export files="Update-13-1231.rar Update-14-0206-0602.tar.rar USB-1.rar USB-2.rar wikileaks-bank-julius-baer.zip wikileaks-safekeep-to-08-0210.zip WikiLeaks-State-Dept-Cables-xyz.zip Cryptome-Update-13-0701-to-13-1202.tgz"
for cfile in `echo $files`; do
export olist=""
for chost in `echo $onions`; do
export olist="${olist} http://${chost}/cryptome-july2014/${cfile}"
done
echo "Retrieving $cfile ..."
aria2c \
--all-proxy=127.0.0.1:8118 \
--continue=true --always-resume=true \
--retry-wait=30 --timeout=120 \
--summary-interval=3 \
--max-connection-per-server=2 --max-concurrent-downloads=8 \
-o "$cfile" `echo $olist`
done

-------------- next part --------------

coderman
2014-07-09 14:04:06 UTC
Permalink
Post by grarpamp
...
To do any of this you will need to collect all the releases of docs
and images to date, in their original format (not AP newsspeak),
in one place. Then dedicate much time to normalizing, convert to
one format and import into tagged document store, etc. Yes, this
could be hosted on the darknet.
indeed. i will also be hosting the complete cryptome archive on hidden
site, as it too is part of this corpus to feed into a normalization
and extraction engine of great justice. i am using the various python
image processing libraries to accomplish this but any language or tool
could be useful.

i had hoped to distribute the cryptome archives further during the
Paris hackfest, alas, unexpected events conspired otherwise.

anyone who would like to host mirrors is welcome to tell me how they
anticipate mirroring ~30G of data as quickly as possible. :)
Loading...