scratch/output/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/index.html
2011-02-01 23:48:44 +01:00

226 lines
No EOL
10 KiB
HTML

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="keywords" content="awk, shell, script">
<link rel="shortcut icon" type="image/x-icon" href="/Scratch/img/favicon.ico" />
<link rel="stylesheet" type="text/css" href="/Scratch/assets/css/main.css" />
<link rel="stylesheet" type="text/css" href="/Scratch/css/twilight.css" />
<link rel="stylesheet" type="text/css" href="/Scratch/css/idc.css" />
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feeds.feedburner.com/yannespositocomen"/>
<link rel="alternate" lang="fr" xml:lang="fr" title="découper un fichier par mots clés" type="text/html" hreflang="fr" href="/Scratch/fr/blog/2010-02-18-split-a-file-by-keyword/" />
<link rel="alternate" lang="en" xml:lang="en" title="split a file by keyword" type="text/html" hreflang="en" href="/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/" />
<script type="text/javascript" src="/Scratch/js/jquery-1.3.1.min.js"></script>
<script type="text/javascript" src="/Scratch/js/jquery.cookie.js"></script>
<script type="text/javascript" src="/Scratch/js/index.js"></script>
<!--[if lt IE 9]>
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
<![endif]-->
<!-- < % if containMaths %>
<script type="text/javascript" src="/Scratch/js/MathJax/MathJax.js"></script>
< % end %>
-->
<title>split a file by keyword</title>
</head>
<body lang="en">
<script type="text/javascript">// <![CDATA[
document.write('<div id="blackpage"><img src="/Scratch/img/loading.gif" alt="loading..."/></div>');
// ]]>
</script>
<div id="content">
<div id="choix">
<div class="return"><a href="#entete">&darr; Menu &darr;</a></div>
<div id="choixlang">
<a href="/Scratch/fr/blog/2010-02-18-split-a-file-by-keyword/" onclick="setLanguage('fr')">en Français</a>
</div>
</div>
<img src="/Scratch/img/presentation.png" alt="Presentation drawing"/>
<div id="titre">
<h1>
split a file by keyword
</h1>
</div>
<div class="flush"></div>
<div class="flush"></div>
<div id="afterheader">
<div class="corps">
<p>Strangely enough, I didn&rsquo;t find any built-in tool to split a file by keyword. I made one myself in <code>awk</code>. I put it here mostly for myself. But it could also helps someone else.
The following code split a file for each line containing the word <code>UTC</code>.</p>
<div><pre class="twilight">
<span class="Comment"><span class="Comment">#</span>!/usr/bin/env awk</span>
<span class="Entity">BEGIN</span>{i=0;}
<span class="StringRegexp"><span class="StringRegexp">/</span>UTC<span class="StringRegexp">/</span></span> {
i+=1;
FIC=<span class="SupportFunction">sprintf</span>(<span class="String"><span class="String">&quot;</span>fic.%03d<span class="String">&quot;</span></span>,i);
}
{<span class="SupportFunction">print</span> <span class="Variable"><span class="Variable">$</span>0</span>&gt;&gt;FIC}
</pre></div>
<p>In my real world example, I wanted one file per day, each line containing UTC being in the following format:</p>
<pre class="twilight">
Mon Dec 7 10:32:30 UTC 2009
</pre>
<p>I then finished with the following code:</p>
<div><pre class="twilight">
<span class="Comment"><span class="Comment">#</span>!/usr/bin/env awk</span>
<span class="Entity">BEGIN</span>{i=0;}
<span class="StringRegexp"><span class="StringRegexp">/</span>UTC<span class="StringRegexp">/</span></span> {
date=<span class="Variable"><span class="Variable">$</span>1</span><span class="Variable"><span class="Variable">$</span>2</span><span class="Variable"><span class="Variable">$</span>3</span>;
<span class="Keyword">if</span> ( date&nbsp;!= olddate ) {
olddate=date;
i+=1;
FIC=<span class="SupportFunction">sprintf</span>(<span class="String"><span class="String">&quot;</span>fic.%03d<span class="String">&quot;</span></span>,i);
}
}
{<span class="SupportFunction">print</span> <span class="Variable"><span class="Variable">$</span>0</span>&gt;&gt;FIC}
</pre></div>
</div>
<div id="choixrss">
<a id="rss" href="http://feeds.feedburner.com/yannespositocomen">
Subscribe
</a>
</div>
<script type="text/javascript">
$(document).ready(function(){
$('#comment').hide();
$('#clickcomment').click(showComments);
});
function showComments() {
$('#comment').show();
$('#clickcomment').fadeOut();
}
document.write('<div id="clickcomment">Comments</div>');
</script>
<div class="flush"></div>
<div class="corps" id="comment">
<h2 class="first">comments</h2>
<noscript>
You must enable javascript to comment.
</noscript>
<script type="text/javascript">
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
var idcomments_post_id = '/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/';
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/';
</script>
<span id="IDCommentsPostTitle" style="display:none"></span>
<script type='text/javascript' src='/Scratch/js/genericCommentWrapperV2.js'></script>
</div>
<div id="entete" class="corps_spaced">
<div id="liens">
<ul><li><a href="/Scratch/en/">Home</a></li>
<li><a href="/Scratch/en/blog/">Blog</a></li>
<li><a href="/Scratch/en/softwares/">Softwares</a></li>
<li><a href="/Scratch/en/about/">About</a></li></ul>
</div>
<div class="flush"></div>
<hr/>
<div id="next_before_articles">
<div id="previous_articles">
previous entries
<div class="previous_article">
<a href="/Scratch/en/blog/2010-02-16-All-but-something-regexp--2-/"><span class="nicer">«</span>&nbsp;Pragmatic Regular Expression Exclude (2)</a>
</div>
<div class="previous_article">
<a href="/Scratch/en/blog/2010-02-15-All-but-something-regexp/"><span class="nicer">«</span>&nbsp;Pragmatic Regular Expression Exclude</a>
</div>
<div class="previous_article">
<a href="/Scratch/en/blog/2010-01-12-antialias-font-in-Firefox-under-Ubuntu/"><span class="nicer">«</span>&nbsp;antialias font in Firefox under Ubuntu</a>
</div>
</div>
<div id="next_articles">
next entries
<div class="next_article">
<a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/">When regexp is not the best solution&nbsp;<span class="nicer">»</span></a>
</div>
<div class="next_article">
<a href="/Scratch/en/blog/2010-03-22-Git-Tips/">Git Tips&nbsp;<span class="nicer">»</span></a>
</div>
<div class="next_article">
<a href="/Scratch/en/blog/2010-03-23-Encapsulate-git/">Encapsulate git&nbsp;<span class="nicer">»</span></a>
</div>
</div>
<div class="flush"></div>
</div>
</div>
<div id="bottom">
<div>
<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Copyright ©, Yann Esposito</a>
</div>
<div id="lastmod">
Created: 02/18/2010
Modified: 05/09/2010
</div>
<div>
Entirely done with
<a href="http://www.vim.org">Vim</a>
and
<a href="http://nanoc.stoneship.org">nanoc</a>
</div>
<div>
<a href="/Scratch/en/validation/">Validation</a>
<a href="http://validator.w3.org/check?uri=referer"> [xhtml] </a>
.
<a href="http://jigsaw.w3.org/css-validator/check/referer?profile=css3"> [css] </a>
.
<a href="http://validator.w3.org/feed/check.cgi?url=http%3A//yannesposito.com/Scratch/en/blog/feed/feed.xml">[rss]</a>
</div>
</div>
<div class="clear"></div>
</div>
<script type="text/javascript">
var clicky = { log: function(){ return; }, goal: function(){ return; }};
var clicky_site_id = 66374971;
(function() {
var s = document.createElement('script');
s.type = 'text/javascript';
s.async = true;
s.src = ( document.location.protocol == 'https:' ? 'https://static.getclicky.com/js' : 'http://static.getclicky.com/js' );
( document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0] ).appendChild( s );
})();
</script>
<noscript><p><img alt="Clicky" width="1" height="1" src="http://in.getclicky.com/66374971ns.gif" /></p></noscript>
</body>
</html>