There are several methods that sites use against this sort of scraping and direct linking or embedding. The basic old methods included:
- Checking the user's cookies: to at least check the user already had a session from a previous page on this site; some sites might go further and look for the presence of specific cookie or session variables that verify a genuine path through the site.
- Checking the
cgi.http_referer variable to see whether the user arrived from the expected source.
- Checking whether the
cgi.http_user_agent looks like a known human browser - or checking that the user agent does not look like a known bot browser.
Other more intelligent methods of course exist, but in my experience if you're requiring more than the above then you're reaching the territory of requiring a captcha and/or requiring a user to register and log in.
Obviously (2) and (3) are easily spoofed by setting the headers manually. For (1) if you're using cfhttp or its equivalent in another language, then you need to ensure that cookies returned in the Set-Cookie header of the site's response, are returned in the headers of your subsequent request by using cfhttpparam. Various cfhttp wrappers and alternative libraries such as Java wrappers bypassing the cfhttp layer, are available to do this. But if you want to understand a simple example of how this works then Ben Nadel has an old but good one here: https://www.bennadel.com/blog/725-maintaining-sessions-across-multiple-coldfusion-cfhttp-requests.htm
With the pdf url from the link in your question, a couple of minutes tinkering in Chrome shows that if I lose the cookies from the previous page and keep the http_referer then I see the captcha challenge, but if I keep the cookies and lose the http_referer then I get directly through to the pdf. This confirms that they care about the cookies but not the referer.
Copy of Ben's example for SO completeness:
<cffunction
name="GetResponseCookies"
access="public"
returntype="struct"
output="false"
hint="This parses the response of a CFHttp call and puts the cookies into a struct.">
<!--- Define arguments. --->
<cfargument
name="Response"
type="struct"
required="true"
hint="The response of a CFHttp call."
/>
<!---
Create the default struct in which we will hold
the response cookies. This struct will contain structs
and will be keyed on the name of the cookie to be set.
--->
<cfset LOCAL.Cookies = StructNew() />
<!---
Get a reference to the cookies that werew returned
from the page request. This will give us an numericly
indexed struct of cookie strings (which we will have
to parse out for values). BUT, check to make sure
that cookies were even sent in the response. If they
were not, then there is not work to be done.
--->
<cfif NOT StructKeyExists(
ARGUMENTS.Response.ResponseHeader,
"Set-Cookie"
)>
<!---
No cookies were send back in the response. Just
return the empty cookies structure.
--->
<cfreturn LOCAL.Cookies />
</cfif>
<!---
ASSERT: We know that cookie were returned in the page
response and that they are available at the key,
"Set-Cookie" of the reponse header.
--->
<!---
Now that we know that the cookies were returned, get
a reference to the struct as described above.
--->
<!---
The cookies might be coming back as a struct or they
might be coming back as a string. If there is only
ONE cookie being retunred, then it comes back as a
string. If that is the case, then re-store it as a
struct.
---><!---<cfdump var="#arguments#" label="Line 305 - arguments for function GetResponseCookies" output="D:\web\safenet_GetResponseCookies.html" FORMAT="HTML">--->
<cfif IsSimpleValue(ARGUMENTS.Response.ResponseHeader[ "Set-Cookie" ])>
<cfset LOCAL.ReturnedCookies = {} />
<cfset LOCAL.ReturnedCookies[1] = ARGUMENTS.Response.ResponseHeader[ "Set-Cookie" ] />
<cfelse>
<cfset LOCAL.ReturnedCookies = ARGUMENTS.Response.ResponseHeader[ "Set-Cookie" ] />
</cfif>
<!--- Loop over the returned cookies struct. --->
<cfloop
item="LOCAL.CookieIndex"
collection="#LOCAL.ReturnedCookies#">
<!---
As we loop through the cookie struct, get
the cookie string we want to parse.
--->
<cfset LOCAL.CookieString = LOCAL.ReturnedCookies[ LOCAL.CookieIndex ] />
<!---
For each of these cookie strings, we are going to
need to parse out the values. We can treate the
cookie string as a semi-colon delimited list.
--->
<cfloop
index="LOCAL.Index"
from="1"
to="#ListLen( LOCAL.CookieString, ';' )#"
step="1">
<!--- Get the name-value pair. --->
<cfset LOCAL.Pair = ListGetAt(
LOCAL.CookieString,
LOCAL.Index,
";"
) />
<!---
Get the name as the first part of the pair
sepparated by the equals sign.
--->
<cfset LOCAL.Name = ListFirst( LOCAL.Pair, "=" ) />
<!---
Check to see if we have a value part. Not all
cookies are going to send values of length,
which can throw off ColdFusion.
--->
<cfif (ListLen( LOCAL.Pair, "=" ) GT 1)>
<!--- Grab the rest of the list. --->
<cfset LOCAL.Value = ListRest( LOCAL.Pair, "=" ) />
<cfelse>
<!---
Since ColdFusion did not find more than one
value in the list, just get the empty string
as the value.
--->
<cfset LOCAL.Value = "" />
</cfif>
<!---
Now that we have the name-value data values,
we have to store them in the struct. If we are
looking at the first part of the cookie string,
this is going to be the name of the cookie and
it's struct index.
--->
<cfif (LOCAL.Index EQ 1)>
<!---
Create a new struct with this cookie's name
as the key in the return cookie struct.
--->
<cfset LOCAL.Cookies[ LOCAL.Name ] = StructNew() />
<!---
Now that we have the struct in place, lets
get a reference to it so that we can refer
to it in subseqent loops.
--->
<cfset LOCAL.Cookie = LOCAL.Cookies[ LOCAL.Name ] />
<!--- Store the value of this cookie. --->
<cfset LOCAL.Cookie.Value = LOCAL.Value />
<!---
Now, this cookie might have more than just
the first name-value pair. Let's create an
additional attributes struct to hold those
values.
--->
<cfset LOCAL.Cookie.Attributes = StructNew() />
<cfelse>
<!---
For all subseqent calls, just store the
name-value pair into the established
cookie's attributes strcut.
--->
<cfset LOCAL.Cookie.Attributes[ LOCAL.Name ] = LOCAL.Value />
</cfif>
</cfloop>
</cfloop>
<!--- Return the cookies. --->
<cfreturn LOCAL.Cookies />
</cffunction>
Assuming you have a cfhttp response from the first page https://www.osapublishing.org/boe/abstract.cfm?uri=boe-11-5-2745 and pass that response into the above function and hold its result in a variable named cookieStruct, then you can use this inside subsequent cfhttp requests:
<cfloop item="strCookie" collection="#cookieStruct#">
<cfhttpparam type="COOKIE" name="#strCookie#" value="#cookieStruct[strCookie].Value#" />
</cfloop>
Edit: if using wget instead of cfhttp - you could try the approach from the answer to this question - but without posting a username and password since you don't actually need a login form
How to get past the login page with Wget?
eg
# Get a session.
wget --save-cookies cookies.txt \
--keep-session-cookies \
--delete-after \
https://www.osapublishing.org/boe/abstract.cfm?uri=boe-11-5-2745
# Now grab the page or pages we care about.
# You may also need to add valid http_referer or http_user_agent headers
wget --load-cookies cookies.txt \
https://www.osapublishing.org/boe/viewmedia.cfm?uri=boe-11-5-2745&seq=0
...although as others have pointed out, you may be violating the terms of service of the source, so I couldn't recommend actually doing this.