根据实际工作需要,想从访问日志里找出自己想要的东西,如找不到的文件,从google来的还是从yahoo来的或从别的地方来的,还是搜索引擎的蜘蛛访问。原理很简单就是打开文件,过滤不要的记录,分解记录字段,列表所需结果。几乎凭一个PHP的函数preg_match()搞定。下面是源代码,自己研究吧!
<html>
<head>
<title>
Simple tools for website logs
</title>
</head>
<body>
<form name="my_form" method="post">
Select your type :<br>
<select name="type">
<option value="">Get the null links</option>
<option value="yahoo">Acess from yahoo</option>
<option value="google">Access from google</option>
<option value="msn">Access from Msn</option>
<option value="robot">Access by robots</option>
</select>
<input type="submit" name="submit" value="get the result">
</form>
<table border=1>
<tr bgcolor="#FFCCFF">
<td><font color="#000000">ClientIP</font></td>
<td><font color="#000000">AccessTime</font></td>
<td><font color="#000000">TargetPage</font></td>
<td><font color="#000000">Code</font></td>
<td><font color="#000000">FromURL</font></td>
<td><font color="#000000">Client ENV</font></td>
</tr>
<?PHP
$doc_path= $_SERVER["DOCUMENT_ROOT"];
if(substr($doc_path,-1)!="/"){
$doc_path=$doc_path."/";
}
if($type=='yahoo'){
$lines = file ($doc_path.'logs/access_log');
foreach ($lines as $line_num => $line) {
if (preg_match ("/yahoo/i",strtolower($line))) {
if (!preg_match ("/slurp/",strtolower($line))){
preg_match("/([0-9.]+)?([ -]+)?(\[)?([0-9a-zA-Z+: \/]+)?(\])?( \"GET \/)?([a-z0-9A-Z.\/\?&=%_\-:+]+)?( HTTP\/1.[1|0|2]\" )?([0-9.]+)?( )?([0-9.\-]+)?( \")?([a-z0-9A-Z.\/\?&=%_\-:+]+)?(\" \")?(.*)/i",$line, $matches);
echo "<tr><td>".$matches[1]."</td><td>".$matches[4]."</td><td>".$matches[7]."</td><td>".$matches[9]."</td><td>".$matches[13]."</td><td>".$matches[15]."</td><tr>";
}
}
}
}elseif($type=="robot"){
$lines = file ($doc_path.'logs/access_log');
foreach ($lines as $line_num => $line) {
if (!preg_match("/robots.txt/i",$line)){
if (preg_match ("/(slurp)|(msnbot)|(googlebot)|(psbot)/i",strtolower($line))){
preg_match("/([0-9.]+)?([ -]+)?(\[)?([0-9a-zA-Z+: \/]+)?(\])?( \"GET \/)?([a-z0-9A-Z.\/\?&=%_\-:+]+)?( HTTP\/1.[1|0|2]\" )?([0-9.]+)?( )?([0-9.\-]+)?( \")?([a-z0-9A-Z.\/\?&=%_\-:+]+)?(\" \")?(.*)/i",$line, $matches);
echo "<tr><td>".$matches[1]."</td><td>".$matches[4]."</td><td>".$matches[7]."</td><td>".$matches[9]."</td><td>".$matches[13]."</td><td>".$matches[15]."</td><tr>";
}
}
}
}elseif($type!=""){
$lines = file ($doc_path.'logs/access_log');
foreach ($lines as $line_num => $line) {
if (preg_match ("/$type/i",strtolower($line))) {
if (!preg_match ("/".$type."bot/",strtolower($line))){
preg_match("/([0-9.]+)?([ -]+)?(\[)?([0-9a-zA-Z+: \/]+)?(\])?( \"GET \/)?([a-z0-9A-Z.\/\?&=%_\-:+]+)?( HTTP\/1.[1|0|2]\" )?([0-9.]+)?( )?([0-9.\-]+)?( \")?([a-z0-9A-Z.\/\?&=%_\-:+]+)?(\" \")?(.*)/i",$line, $matches);
echo "<tr><td>".$matches[1]."</td><td>".$matches[4]."</td><td>".$matches[7]."</td><td>".$matches[9]."</td><td>".$matches[13]."</td><td>".$matches[15]."</td><tr>";
}
}
}
}else{
$lines = file ($doc_path.'logs/access_log');
foreach ($lines as $line_num => $line) {
if (preg_match ("/ 404 /i",$line)) {
if (!preg_match ("/robots.txt/",$line)){
preg_match("/([0-9.]+)?([ -]+)?(\[)?([0-9a-zA-Z+: \/]+)?(\])?( \"GET \/)?([a-z0-9A-Z.\/\?&=%_\-:+]+)?( HTTP\/1.[1|0|2]\" )?([0-9.]+)?( )?([0-9.\-]+)?( \")?([a-z0-9A-Z.\/\?&=%_\-:+]+)?(\" \")?(.*)/i",$line, $matches);
echo "<tr><td>".$matches[1]."</td><td>".$matches[4]."</td><td>".$matches[7]."</td><td>".$matches[9]."</td><td>".$matches[13]."</td><td>".$matches[15]."</td><tr>";
}
}
}
}
?>
</table>
</body>
</html>