Question for you guys:
Should 4chan proactively try and prevent scrapers from archive /b>
vote here: http://www.poll-maker.com/poll639462x891d44F3-26
>>677287882
bump
>>677287882
I think they should. Archiving /b ruins it
>>677287882
Bumpity
Who is this? see her before
>>677287882
Let's hear your opinions /b/ros
>>677288350
agreed completely
I'd they could do it without anything changes in the user experience, who would be against that?
bump with titties
>>677289420
vote guys
>>677289582
>>677287882
>http://www.poll-maker.com/poll639462x891d44F3-26
vote
>>677287882
What are scrapers?
all you anons are cool with everything you post on /b/ being scraped and used by other sites to make money? Taken and used in ads and whatever else people want without anyone knowing and under the mislead claim all data is cleared from 4chan servers? That's true, but scrapers move the content off 4chans servers and saves it to other servers.
>>677289947
scrapers are robots that copy all content from boards and save them offsite to other servers.
fgts.jp/b/ is an example. All content is copied and saved without expiration
>>677287882
How would it be done? Anyone can save any webpage, even if it's just screenshots
>>677289758
more of her?
>>677289758
no one cares?
anons who contribute OC, you are all aware that nothing is deleted right? 4chan may delete content from their servers, but that's long after scrapers have copied it offsite.
>>677290228
personal scraping is fine because of scale. Save to your hd all you want.
Scraping all of /b/ at all times is a different story. The content is then hosted on another website exactly as it's hosted here.
perfect example:
http://fgts.jp/b/thread/677287882
this board is being scraped as i post.
>>677290352
How is money being made? Jokes? Pictures? I can't make an informed decision without knowing more details. I don't see how this could even be prevented.
>>677290352
any ass pics plz?
How is any site safe from this? Have other sites devised methods for subverting this? What are the methods?
>>677290711
through ads on sites, though distribution of content across different amateur networks. Mostly ads though. How do you think images here end up in porn pop up ads?
As for prevention, it's hard to prevent all content from being stripped, but full images can be done by not embedding the cdn link in page. On thumbnail click, have a function call that retrieves the image. So it's not exposed on the board explicitly.
>>677290988
don't embed links in the pages explicitly. Have resources delivered via function calls. Server-side calls that return resources that have CORS enabled are a start.
>>677291140
>>677290988
this solution only works for full sized images. Hiding of text and thumbnails is definitely harder to do considering the board format. But considering 4chan is an image board, images are the place to start.
>>677290841
don't have any
>>677290696
damn that server is slow as hell
>>677291424
it may be slow at times, but it has >250,000 pages of /b threads
>>677291140
>>677291025
OK so you're mainly concerned with images, not written ccontent. Is there any way to prevent text from being scraped?
I imagine bots could also take screenshots repeatedly and have everything by means of screenshots.
>>677291810
image can be put into a control instead of straight html. Normal scrapers just traverse html and copy resources that way. So if you can remove html and substitute different containers, that's a good start.
Screenshots aren't the same as html. Taking pictures of boards is actually pretty lame and no one is going to want to look through those pictures. Also, that means the bots would have to actually render the html and then screenshot it. So they'd have to have an environment where they could do that. You can't do that in a normal terminal. You'd need a headless browser or something.
no one is voting...
come on guys!
ok guys. Just keep in mind, nothing you post on /b goes away and any oc you are providing is going to be taken and used without your consent.
>>677292235
Why would taking screenshots be lame? Serious question. I understand that the text wouldn't be analyzable the same way it would be with HTML but what do you mean by calling it lame?
>>677294181
you couldn't expand images, you could search images without snipping images with some screenshot tool, which then blows the image hash up. I mean, it may be fine for some people, but it terms of copying, it's just a picture, not a copy.
>>677294377
also, you lose all ability to cross reference.
Bump
saging with dicks. Stop trying to control information assholes. Go work for the NSA or something
Resaging and one more thing: the only people who care about archiving is chicks who are mad when their nudes get leaked, as if we remember your faces post-load anyway
>>677295931
and people who post OC. You faggots that don't contribute don't give a fuck, but people who contribute shit thinking it will be deleted get fucked.
fuck your saging too
Nah I love fgts it's allowed me to find nudes of about 3 girls I know. As soon as you get a lead the whole set is yours it's fucking glorious
>>677294377
>in terms of copying, it's just a picture, not a copy
So? What is the significance of having a copy? It's still all the same text included. The images would be screenshots of thumbnails but it would be better than nothing for the people scraping.
If I was someone looking through archives, I wouldn't mind if each thread was stored as an image or series of images as long as they were organized
Do you think the best that can be done is to just prevent images from being copied exactly? But the text is pretty much unprotectable (albeit not analyzable)?
I voted yes by the way
>>677296246
Then let this be the least of their wake-up calls to the reality of the internet. NOTHING GETS DELETED IDIOT. This is the same mentality where people get mad at McDonalds when they get fat. Fucking don't go there you goddamn mouth breathing bite plate autists. Why the furry fuck would someone come to /b/ of all goddamn places and look for anything except the worst-case scenario? Idiots can't be saved from themselves with more censorship.
it's tied folks.
>>677295654
Where'd you get that gif?
>>677295654
deleting data is the opposite of what the NSA does. Autistic detected
>>677296246
Why would someone who doesn't contribute give a fuck? Why would you expect that? Of we don't give a fuck. You're not too fucking bright are you?
>>677296620
stuff gets deleted all the time. CP, ID info, etc. mods do it all the time.
4chan is legally obliged to keep records anyway. Your votes don't matter. Take it up with your terrorism countermeasures overlords.
>>677297020
Deleting is control, dumbass. You're a control freak
>>677297052
learn to read. jesus christ
>>677290051
>all you anons are cool with everything you post on /b/ being scraped and used by other sites to make money?
way it goes.
you can be a whiney little /b/itch about it or you can just accept that's the way it goes.
>>677296291
exactly. It allows lurkers to go through old shit and resurrect it. Why not just save it to your hd and not use the archives?
>>677297187
Does that save it from scrapers? Not usually. This is an aside to the conversation.
>>677297280
I am reading your shit. Stop dodging the question. Why would you expect people who don't contribute OC to give a fuck?
>>677297343
it doesn't have to be that way though. If it could be changed and not one user on 4chan notices the difference, what's wrong with that?
>>677297469
no but scrapers can still save it and there are practically 0 mods on the scraper sites. So content deleted by 4chan mods can leak out.
We're already saving everything we like, and there are bots to do it as well. Asking the admins to get involved in censorship even more is a bad direction. Fuck the need for control. Don't post if you don't want it out there. Sounds pretty simple.
>>677297474
i don't because i don't give a shit about those people. People who contribute to 4chan make it what it is, not lurkers.
>>677297596
So all the cp goes there?
Why is it so bad to have the data go to scrapers?
>>677297510
>https://m.youtube.com/watch?v=Sxvd8NEd_C8
>>677297510
>If it
if I had wings I could fly.
I could be a whiney little /b/itch about not having wings or I could accept that's the way it goes.
>good luck getting those wings, anon
>>677297752
Some cp goes there if the mods can't remove it fast enough.
Rule 14. The use of scrapers, bots, or other automated posting or downloading scripts is prohibited. Users may also not post from proxies, VPNs, or Tor exit nodes.
It goes against the rules.
It's a transparency thing.
I'm just asking the question to other /b/ros. How do you all feel about this, hence the poll.
>>677298102
Why is it a rule to begin with? Did moot make that rule?
we're getting some number now
>>677298102
>How do you all feel about this,
way it goes.
if you don't like it, don't use 4chin. plenty of room for you on reddddit or an MLP community.
>>677298072
You lack imagination
>>677298363
>You lack imagination
you lack any business sense and any concept of reality
way it goes
>>677298277
moot was against it.
>>677298360
You're too proud to admit that you don't know the answers. It clearly marks out the limits of your intelligence. You're too afraid to even try.
>>677298475
>turtle soup.gif
by the way, the bird is you all naively thinking "oh look, pretty turtle"
>>677298475
scraper are actually detrimental to 4chan because they have to handle all the bots requesting data from their servers, which cost money. Also, all the site statistics get thrown off because bots are constantly connecting to the site. If the scrapers or crawlers are legit, they should respect the robots.txt file on the page, but obviously, those rules are easy to ignore.
>>677298812
What do you stand to gain from this?
>>677299309
nothing. I just want 4chan to be the way i thought it was. I'm a software engineer and I write p2p/encryption open source stuff, and this bothers me.
>>677299548
So as an open source coder... why would this bother you?
>>677299642
because 4chan original goal was to allow things to come and go. To be shown then removed. And that's not happening. It's all getting saved permanently. That bothers me.
It also bother me that other sites copy content from 4chan and then use it for personal gains. 4chan mods are pretty responsible but once the data is copied then new mods take over, and those mods are fucking trash.
>>677299914
How can you prove that you aren't trash?
>>677298548
OP is massive faggot.
>>677300097
i don't have to prove anything. I'm just asking a question to /b.
>>677300229
No you definitely should prove that you're not trash. Those mods are people just like you and me. You shouldn't call them trash.
>>677300479
if you reread the comment, i said mods on the archive sites are trash.
>>677300767
Yes, but they're still people, somebody's sons or daughters.
>>677300953
well, they suck at their jobs. Nothing personal
>>677300953
example
>>677298548
faggot, you need the power to do it. and the powers that by have made their choice.
you're like a 4th grade girl who runs for class president on the platform that you'll double the number of recesses and get Hawaiian Punch in the drinking fountains instead of water.
No matter how good the idea may be, if you don't have the power to made the decision and implement it, aint nothing going to change.
>>677300953
another
>>677298812
>those rules are easy to ignore.
and there you have it
>it's pointless faggots, grow up and move on
>>677301856
Sounds pretty glib
>>677301895
read the rest of the posts faggot. The pages can be changes so they can't be scraped
>>677303057
learn to life.
the powers that /b/ don't want to make the changes.
this isn't a democracy faggot. if you're going to daydream about changes YOU can't make, day dream about something cool like being able to breathe fire if you want.
>>677303849
ok neckbeard. learn to write
>>677304303
are you seriously this new?
>faggot is obviously seriously this new
>>677287882
Those perfect tits wasted on a shaved head, pierced lip SJW shitbag.
numbers so far