今天遇到一个问题,问题描述stackoverflow上这个一样:https://stackoverflow.com/questions/25726048/logstash-geoip-on-x-fowarded-for
由于 X-FORWARDED-FOR 字段的IP数量是不固定的,导致geoip 提取时失败。而stackoverflow这个问题的解答是有问题的,因为它把一条记录拆分成了两条。并且还引入了引号,所以也无法提取到geoip。
我最终的解决办法是:
1.apache 配置的日志格式为:
1 2 | LogFormat "\"%{X-Forwarded-For}i\" %h %l %u %t \"%r\" %>s %b" TransferLog "/data/log/access_log" |
2.logstash定义正则
vi vendor/bundle/jruby/1.9/gems/logstash-patterns-core-4.0.2/patterns/grok-patterns #最后添加以下两行
1 2 | ANYTHING (.*) FORWARDCOMMONAPACHELOG "%{ANYTHING:clientip}" %{ IPORHOST :directip } %{ HTTPDUSER :ident } %{ USER :auth } \[%{ HTTPDATE :timestamp }\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{ NUMBER :response } (?:%{ NUMBER :bytes }|-) |
3.配置文件使用mutate的gsub做替换(将clientip字段的逗号及其它后的字符替换为空):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | input { beats { port => "5044" } } filter { grok { match => { "message" => "%{FORWARDCOMMONAPACHELOG}" } } mutate{ gsub => [ "clientip" , ",.*" , "" ] } geoip { source => "clientip" } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } } output { stdout { codec => rubydebug } } |
原文出自:
http://blog.too2.net/?p=370
转载请注明转自:辛碌力成【http://blog.too2.net】
发表评论