{"id":64687,"date":"2025-05-03T10:51:49","date_gmt":"2025-05-03T02:51:49","guid":{"rendered":"https:\/\/fwq.ai\/blog\/64687\/"},"modified":"2025-05-03T10:51:49","modified_gmt":"2025-05-03T02:51:49","slug":"java%e7%88%ac%e8%99%ab%e7%99%bb%e5%bd%95%e8%8e%b7%e5%8f%96html%e9%a1%b5%e9%9d%a2","status":"publish","type":"post","link":"https:\/\/fwq.ai\/blog\/64687\/","title":{"rendered":"java\u722c\u866b\u767b\u5f55\u83b7\u53d6html\u9875\u9762"},"content":{"rendered":"<blockquote><p>\n  \u672c\u6559\u7a0b\u63d0\u4f9b\u4e86\u9010\u6b65\u6307\u5357\uff0c\u5e2e\u52a9 java \u5f00\u53d1\u4eba\u5458\u767b\u5f55\u7f51\u7ad9\u5e76\u83b7\u53d6\u76ee\u6807\u9875\u9762\u3002\u6b65\u9aa4\u5305\u62ec\uff1a\u521b\u5efa http \u5ba2\u6237\u7aef\u3001\u8bbe\u7f6e\u767b\u5f55\u8868\u5355\u6570\u636e\u3001\u6784\u9020\u767b\u5f55\u8bf7\u6c42\u3001\u53d1\u9001\u767b\u5f55\u8bf7\u6c42\u3001\u83b7\u53d6\u767b\u5f55 cookie\u3001\u6784\u9020\u9875\u9762\u8bf7\u6c42\u3001\u6dfb\u52a0 cookie \u5230\u8bf7\u6c42\u3001\u53d1\u9001\u9875\u9762\u8bf7\u6c42\u548c\u83b7\u53d6\u9875\u9762 html\u3002\n<\/p><\/blockquote>\n<p><img decoding=\"async\" src=\"https:\/\/img.php.cn\/upload\/article\/202411\/04\/2024110407072075523.jpg\" class=\"aligncenter\" title=\"java\u722c\u866b\u767b\u5f55\u83b7\u53d6html\u9875\u9762\u63d2\u56fe\" alt=\"java\u722c\u866b\u767b\u5f55\u83b7\u53d6html\u9875\u9762\u63d2\u56fe\" \/><\/p>\n<p><strong>\u5982\u4f55\u4f7f\u7528 Java \u722c\u866b\u767b\u5f55\u5e76\u83b7\u53d6 HTML \u9875\u9762<\/strong><\/p>\n<p><strong>\u6b65\u9aa4\uff1a<\/strong><\/p>\n<p><strong>1. \u521b\u5efa HTTP \u5ba2\u6237\u7aef\uff1a<\/strong><br \/>\u4f7f\u7528\u5e93\uff08\u4f8b\u5982 HttpClient\u3001HttpURLConnection\uff09\u521b\u5efa HTTP \u5ba2\u6237\u7aef\uff0c\u7528\u4e8e\u5411\u76ee\u6807\u7f51\u7ad9\u53d1\u9001\u8bf7\u6c42\u3002<\/p>\n<p><strong>2. \u8bbe\u7f6e\u767b\u5f55\u8868\u5355\u6570\u636e\uff1a<\/strong><br \/>\u6536\u96c6\u767b\u5f55\u8868\u5355\u4e2d\u7684\u5b57\u6bb5\uff08\u4f8b\u5982\u7528\u6237\u540d\u3001\u5bc6\u7801\uff09\uff0c\u5e76\u4f7f\u7528 Java \u4ee3\u7801\u5c06\u5176\u5305\u88c5\u6210 Key-Value \u5bf9\u3002<\/p>\n<p><span>\u7acb\u5373\u5b66\u4e60<\/span>\u201c\u201d\uff1b<\/p>\n<p><strong>3. \u6784\u9020\u767b\u5f55\u8bf7\u6c42\uff1a<\/strong><br \/>\u4f7f\u7528 HTTP \u5ba2\u6237\u7aef\u6784\u9020\u4e00\u4e2a POST \u8bf7\u6c42\uff0c\u5e76\u6307\u5b9a\u767b\u5f55 URL \u548c\u767b\u5f55\u8868\u5355\u6570\u636e\u3002<\/p>\n<p><strong>4. \u53d1\u9001\u767b\u5f55\u8bf7\u6c42\uff1a<\/strong><br \/>\u5c06\u767b\u5f55\u8bf7\u6c42\u53d1\u9001\u5230\u670d\u52a1\u5668\uff0c\u5e76\u5c06\u670d\u52a1\u5668\u54cd\u5e94\u5b58\u50a8\u5728 Response \u5bf9\u8c61\u4e2d\u3002<\/p>\n<p><strong>5. \u83b7\u53d6\u767b\u5f55 Cookie\uff1a<\/strong><br \/>\u5982\u679c\u767b\u5f55\u6210\u529f\uff0c\u670d\u52a1\u5668\u4f1a\u5728\u54cd\u5e94\u4e2d\u8bbe\u7f6e\u4e00\u4e2a\u6216\u591a\u4e2a Cookie\u3002\u4f7f\u7528 Response \u5bf9\u8c61\u83b7\u53d6\u8fd9\u4e9b Cookie \u5e76\u5b58\u50a8\u5728 CookieStorage \u4e2d\u3002<\/p>\n<p><strong>6. \u6784\u9020\u9875\u9762\u8bf7\u6c42\uff1a<\/strong><br \/>\u6784\u9020\u4e00\u4e2a GET \u8bf7\u6c42\uff0c\u6307\u5b9a\u8981\u83b7\u53d6\u7684\u9875\u9762 URL\u3002<\/p>\n<p><strong>7. \u6dfb\u52a0 Cookie \u5230\u8bf7\u6c42\uff1a<\/strong><br \/>\u5c06\u4ece\u767b\u5f55\u54cd\u5e94\u4e2d\u83b7\u53d6\u7684 Cookie \u6dfb\u52a0\u5230\u9875\u9762\u8bf7\u6c42\u4e2d\u3002<\/p>\n<p><strong>8. \u53d1\u9001\u9875\u9762\u8bf7\u6c42\uff1a<\/strong><br \/>\u4f7f\u7528 CookieStorage \u4e2d\u7684 Cookie \u53d1\u9001\u9875\u9762\u8bf7\u6c42\u3002<\/p>\n<p><strong>9. \u83b7\u53d6\u9875\u9762 HTML\uff1a<\/strong><br \/>\u5c06\u670d\u52a1\u5668\u5bf9\u9875\u9762\u8bf7\u6c42\u7684\u54cd\u5e94\u5b58\u50a8\u5728 Response \u5bf9\u8c61\u4e2d\u3002\u4f7f\u7528\u8be5\u5bf9\u8c61\u83b7\u53d6\u9875\u9762\u7684 HTML\u3002<\/p>\n<p><strong>\u793a\u4f8b\u4ee3\u7801\uff1a<\/strong><\/p>\n<pre>import org.apache.http.HttpEntity;\nimport org.apache.http.HttpResponse;\nimport org.apache.http.NameValuePair;\nimport org.apache.http.client.HttpClient;\nimport org.apache.http.client.entity.UrlEncodedFormEntity;\nimport org.apache.http.client.methods.HttpGet;\nimport org.apache.http.client.methods.HttpPost;\nimport org.apache.http.cookie.Cookie;\nimport org.apache.http.impl.client.HttpClients;\nimport org.apache.http.message.BasicNameValuePair;\n\nimport java.io.IOException;\nimport java.io.InputStream;\nimport java.util.ArrayList;\nimport java.util.List;\nimport java.util.Scanner;\n\npublic class LoginAndCrawl {\n\n    public static void main(String[] args) {\n        String loginUrl = \"https:\/\/www.example.com\/login\";\n        String pageUrl = \"https:\/\/www.example.com\/page\";\n        String username = \"username\";\n        String password = \"password\";\n\n        HttpClient httpClient = HttpClients.createDefault();\n\n        \/\/ \u767b\u5f55\n        List&lt;NameValuePair&gt; loginForm = new ArrayList&lt;&gt;();\n        loginForm.add(new BasicNameValuePair(\"username\", username));\n        loginForm.add(new BasicNameValuePair(\"password\", password));\n        HttpPost loginRequest = new HttpPost(loginUrl);\n        loginRequest.setEntity(new UrlEncodedFormEntity(loginForm));\n        HttpResponse loginResponse = httpClient.execute(loginRequest);\n\n        \/\/ \u83b7\u53d6 Cookie\n        List&lt;Cookie&gt; cookies = httpClient.getCookieStore().getCookies();\n        CookieStorage cookieStorage = new CookieStorage(cookies);\n\n        \/\/ \u83b7\u53d6\u9875\u9762\n        HttpGet pageRequest = new HttpGet(pageUrl);\n        pageRequest.addHeader(\"Cookie\", cookieStorage.getCookieHeaderValue());\n        HttpResponse pageResponse = httpClient.execute(pageRequest);\n\n        \/\/ \u83b7\u53d6 HTML\n        HttpEntity pageEntity = pageResponse.getEntity();\n        InputStream inputStream = pageEntity.getContent();\n        Scanner scanner = new Scanner(inputStream);\n        String html = scanner.useDelimiter(\"\\A\").next();\n        scanner.close();\n\n        \/\/ \u89e3\u6790 HTML ...\n    }\n}<\/pre>\n<p> \u767b\u5f55\u540e\u590d\u5236 <\/p>\n<p>\u4ee5\u4e0a\u5c31\u662f\u767b\u5f55\u83b7\u53d6html\u9875\u9762\u7684\u8be6\u7ec6\u5185\u5bb9\uff0c\u66f4\u591a\u8bf7\u5173\u6ce8IDCBABY\u5176\u5b83\u76f8\u5173\u6587\u7ae0\uff01<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u672c\u6559\u7a0b\u63d0\u4f9b\u4e86\u9010\u6b65\u6307\u5357\uff0c\u5e2e\u52a9 java \u5f00\u53d1\u4eba\u5458\u767b\u5f55\u7f51\u7ad9\u5e76\u83b7\u53d6\u76ee\u6807\u9875\u9762\u3002\u6b65\u9aa4\u5305\u62ec\uff1a\u521b\u5efa http \u5ba2\u6237\u7aef\u3001\u8bbe\u7f6e\u767b\u5f55\u8868\u5355\u6570\u636e\u3001\u6784\u9020\u767b\u5f55\u8bf7\u6c42\u3001\u53d1\u9001\u767b\u5f55\u8bf7\u6c42\u3001\u83b7\u53d6\u767b\u5f55 cookie\u3001\u6784\u9020\u9875\u9762\u8bf7\u6c42\u3001\u6dfb\u52a0 cookie \u5230\u8bf7\u6c42\u3001\u53d1\u9001\u9875\u9762\u8bf7\u6c42\u548c\u83b7\u53d6\u9875\u9762 html\u3002 \u5982\u4f55\u4f7f\u7528 Java \u722c\u866b\u767b\u5f55\u5e76\u83b7\u53d6 HTML \u9875\u9762 \u6b65\u9aa4\uff1a 1. \u521b\u5efa HTTP \u5ba2\u6237\u7aef\uff1a\u4f7f\u7528\u5e93\uff08\u4f8b\u5982 HttpClient\u3001HttpURLConnection\uff09\u521b\u5efa HTTP \u5ba2\u6237\u7aef\uff0c\u7528\u4e8e\u5411\u76ee\u6807\u7f51\u7ad9\u53d1\u9001\u8bf7\u6c42\u3002 2. \u8bbe\u7f6e\u767b\u5f55\u8868\u5355\u6570\u636e\uff1a\u6536\u96c6\u767b\u5f55\u8868\u5355\u4e2d\u7684\u5b57\u6bb5\uff08\u4f8b\u5982\u7528\u6237\u540d\u3001\u5bc6\u7801\uff09\uff0c\u5e76\u4f7f\u7528 Java \u4ee3\u7801\u5c06\u5176\u5305\u88c5\u6210 Key-Value \u5bf9\u3002 \u7acb\u5373\u5b66\u4e60\u201c\u201d\uff1b 3. \u6784\u9020\u767b\u5f55\u8bf7\u6c42\uff1a\u4f7f\u7528 HTTP \u5ba2\u6237\u7aef\u6784\u9020\u4e00\u4e2a POST \u8bf7\u6c42\uff0c\u5e76\u6307\u5b9a\u767b\u5f55 URL \u548c\u767b\u5f55\u8868\u5355\u6570\u636e\u3002 4. \u53d1\u9001\u767b\u5f55\u8bf7\u6c42\uff1a\u5c06\u767b\u5f55\u8bf7\u6c42\u53d1\u9001\u5230\u670d\u52a1\u5668\uff0c\u5e76\u5c06\u670d\u52a1\u5668\u54cd\u5e94\u5b58\u50a8\u5728 Response \u5bf9\u8c61\u4e2d\u3002 5. \u83b7\u53d6\u767b\u5f55 Cookie\uff1a\u5982\u679c\u767b\u5f55\u6210\u529f\uff0c\u670d\u52a1\u5668\u4f1a\u5728\u54cd\u5e94\u4e2d\u8bbe\u7f6e\u4e00\u4e2a\u6216\u591a\u4e2a Cookie\u3002\u4f7f\u7528 Response \u5bf9\u8c61\u83b7\u53d6\u8fd9\u4e9b Cookie \u5e76\u5b58\u50a8\u5728 CookieStorage \u4e2d\u3002 6. \u6784\u9020\u9875\u9762\u8bf7\u6c42\uff1a\u6784\u9020\u4e00\u4e2a GET \u8bf7\u6c42\uff0c\u6307\u5b9a\u8981\u83b7\u53d6\u7684\u9875\u9762 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[],"class_list":["post-64687","post","type-post","status-publish","format-standard","hentry","category-16"],"_links":{"self":[{"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/posts\/64687","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/comments?post=64687"}],"version-history":[{"count":0,"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/posts\/64687\/revisions"}],"wp:attachment":[{"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/media?parent=64687"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/categories?post=64687"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fwq.ai\/blog\/wp-json\/wp\/v2\/tags?post=64687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}