Counting the occurrences of words in a text file
I have a text file with tweets and i'm required to count the number of times a word is mentioned in the tweet For example, the file contains.
Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?
And let's say i want to count how many times the word iphone is mentioned in the file. So here's what i've tried
cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l
it certainly works but I'm confused about the 'wc' command in unix. What's the difference in trying something like this
cut -f 1 Tweet_Data | grep -c "iPhone"
where -c is used instead? Both of these yield different results in a large file full of tweets and i'm confused as to how it works Which method is the best to count occurrences?
Given such a requirement, I would use a GNU grep (for the -o
option ), then pass it through wc
to count the total number of occurrences.
$ grep -o -i iphone Tweet_Data | wc -l
3
Plain grep -c
on the data will count the number of lines that match, not the total number of words that match. Using the -o
option tells grep to output each match on its own line, no matter how many times the match was found in the original line.
wc -l
tells the wc
utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.
If GNU grep is not available (or desired), you could transform the input with tr
so that each word is on its own line, then use grep -c
to count.
$ tr '[:space:]' '[\n*]' < Tweet_Data | grep -i -c iphone
3