Page 1 of 1

Index Scrape/Crawl of Solidworks Forum

Posted: Thu Mar 25, 2021 8:33 am
by jmongi
Before the "transition", would it be feasible to scrape/index the current solidworks userforum to at least get it into a useable/searchable file that could be messed around with by future more ambitious programmers?

I'm just assuming that it will be much easier to do (if it's possible) in its current incarnation than attempting to do anything like that post transition to the swamp. Just a thought for those with way more programming experience than me.

I would think you could write a script to systematically navigate through threads, pull the source (HTML?) copy it to a file and repeat. As I said, the information on its own might not be very usable in that type of format. But, then it would at least be available to be transformed in the future.

Re: Index Scrape/Crawl of Solidworks Forum

Posted: Thu Mar 25, 2021 8:59 am
by SPerman
I am hoping the waybackmachine will take care of that for us.

https://web.archive.org/web/20201202134 ... solidworks

Re: Index Scrape/Crawl of Solidworks Forum

Posted: Thu Mar 25, 2021 10:52 am
by matt
jmongi wrote: Thu Mar 25, 2021 8:33 am Before the "transition", would it be feasible to scrape/index the current solidworks userforum to at least get it into a useable/searchable file that could be messed around with by future more ambitious programmers?

I'm just assuming that it will be much easier to do (if it's possible) in its current incarnation than attempting to do anything like that post transition to the swamp. Just a thought for those with way more programming experience than me.

I would think you could write a script to systematically navigate through threads, pull the source (HTML?) copy it to a file and repeat. As I said, the information on its own might not be very usable in that type of format. But, then it would at least be available to be transformed in the future.
There are some software packages that do this. In the first week we were up, we had a minor scandal where someone actually started doing that and then posted it here. The SW Forum has as part of it's terms of use that you cannot post the content of the SW Forum publicly in another place. So that pretty much covers that.

But they can't control (or more importantly litigate) if you give an account of the same content in your own words. (basically, don't copy/paste anything, but you can summarize or elaborate or this or that, but please don't copy or scrape and then paste here). I want to make it on our own merits rather than resort to copying content (and fending legal jousting).

Re: Index Scrape/Crawl of Solidworks Forum

Posted: Thu Mar 25, 2021 1:37 pm
by jcapriotti
Yeah, I imagine the Dassault legal team is more competent than the 3dswym team. :o

Re: Index Scrape/Crawl of Solidworks Forum

Posted: Tue Mar 30, 2021 8:31 am
by jmongi
I didn't consider the legal aspects of such an activity. Good point.

Re: Index Scrape/Crawl of Solidworks Forum

Posted: Tue Mar 30, 2021 6:18 pm
by colt
@SPerman The wayback machine will work great for surface content, but I don't think it will keep the source files for post attachments like macros or models. Will this stuff be completely lost or is it going to be transferred to the new forum?