262x Filetype PDF File size 0.52 MB Source: www.ijettcs.org
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 4, July – August 2013 ISSN 2278-6856
Enhancement of Security and Embedding
Capacity through Huffman Coding in
Steganography
1 2
Sanjay Bajpai , Dr. Kanak Saxena
1
Department of Computer applications, Lakshmi Narain College of Technology, RGPV, Bhopal
Kalchuri Nagar, Raisen Road, Bhopal (MP), INDIA.
2 Department of computer applications, Samrat Ashok Technological Institute
Civil Lines, Vidisha (M.P.), INDIA.
Abstract: In this paper, we incorporated Huffman coding also used GA in their proposed block truncation coding
algorithm in digital image steganography to enhance both the (BTC) method that can embed about 3 bits in each BTC-
data security and embedding capacity. Data is compressed by encoded block in an average. Elham Ghasemi et al [6]
variable length compression technique and then embedded in embedded the secret message in Discrete Wavelet
digital colored images using modified LSB technique. Pixels Transform coefficients by dividing the image into 4×4
are broken into RGB components and some of the least blocks. In our work, we have taken the digital colored
significant bits, based on certain criterion, of these images as the cover and text message as the host. Factors
components are used to embed the secret message. Security is
extended to a greater extent by using multi-keys for that play a vital role in digital image steganography are
embedding and data compression ratio varies from 45% to image distortion, embedding capacity and security of data
80% that depends on the type of data embedded. Here, we [7] and care is taken in the proposed algorithm to
have categorized data into three types, small length, larger optimize these factors. We have increased the security of
length and data from a particular domain. Our experimental data by using multi-key LSB substitution method and
results have shown that distortion in the stego-image is embedding capacity by compartmentalizing the pixels
negligible, very difficult to extract the message and into its components.
embedding capacity increased to a large extent.
Further security and embedding capacity is enhanced by
Keywords: Compression ratio, Embedding capacity, incorporating the Huffman coding algorithm and results
Distortion show that embedding capacity is increased from 45% to
1. INTRODUCTION 55 % that depends on the type of data chosen for hiding.
Remainder of the paper is organized as follows. In section
The present work shows the embedding of compressed 2, we shall briefly discuss about the related work and
data through Huffman coding in steganography. The Huffman coding algorithm. Section 3 describes the
native meaning of word “steganography” is hidden proposed method with analysis. Section 4 includes case
writing and is originated from the Greek language. It is studies, experimental results and comparisons with
an art and science in which secret data is hid in other previous works. Finally, the conclusion is presented in
medium known as cover and the thing which is hidden is section 5.
known as host [1]. The basic advantage of steganography
is to keep the unwanted persons or intruders away from 2. RELATED WORK
the actual fact and this technique is successful since they Most popular and core concept of steganography is to
are not able to see the hidden message and only cover is hide the secret message in digital images by changing the
visible. Most common method employed in least significant bits of the pixels. Xin Liao and et al. [8]
steganography is the LSB substitution method in which focused on finding the edge pixels of the cover image to
two or three bits are replaced by the bits of the secret hide the secret message because these are the locations
message so that distortion is not visible by human eyes where changes are least visible. Rosziati Ibrahim and et
[2], [3] but they are unable to give high embedding al. [9] proposed a SIS (Steganography Imaging System)
capacity and because of the same pattern, steganalysis in which they converted the secret message into binary
techniques can detect them. El Safy et al. [4] used codes and then embedded the two bits in each pixel.
adaptive data embedding technique involving Optimal Mahmud Hasan and et al. [10] proposed a block
Pixel Adjustment Process to hide data in the Integer processing mechanism in which they divided the image
Wavelet coefficients of the cover image. Recent trends into a number of 4×4 non-overlapping blocks. Most
show that genetic algorithm (GA) is commonly used to Frequent Pixels (MFPs) and Second Most Frequent Pixels
embed the secret message and Chin-Chen Chang et al [5] (SMFPs) of each block were identified and after deleting
Volume 2, Issue 4 July – August 2013 Page 73
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 4, July – August 2013 ISSN 2278-6856
their occurrences, remaining pixels were over written by Ax
the encoded bits of the secret message. L(C, X) P(x)l(x) pili (3)
2.1 Huffman Coding x Ax i 1
Huffman coding is one of the popular entropy encoding Fig.2 shows the result of coding and compression on the
algorithms used for lossless data compression [11]. It is given Ensemble X stated below-
classified into Static Huffman coding and Adaptive A = { a , b , c , d }
x
Huffman coding and later is more beneficial than former P = {1/2 , 1/4 , 1/8, 1/8 }
x
in terms of number of internal nodes, path length and
height of the tree in memory representation [12]. To do
the encoding, at first characters of the file with their
frequency are stored in a list and sorted in the increasing
order of frequency. To achieve optimality, Huffman joins
the two symbols with lowest probability/frequency and
replaces them with a new fictive node whose probability
is the sum of the other nodes' probabilities. The two
removed symbols are now two nodes in the binary tree.
The fictive node is their parent and is not inserted in the
binary tree yet but kept in the input list. We repeat this Figure 2 Code and code length of the symbols given
process until input list becomes empty and at the same probabilities
time binary tree is constructed whose frequency indicates
the total number of symbols in the file. Code for each
symbol is constructed by traversing the tree from the root
to leaf by assigning 0 for left visit and 1 for right visit.
Encoded tree for a sample data is shown in Fig.1.
Figure 3 Contiguous interval representation of symbols
2.2 Huffman Decoding
Decoding procedure can be performed by the binary
Figure 1 Encoded tree for ETASNO search algorithm if the Huffman tree is stored in the
array. This can be demonstrated by taking the sample
For example, here TEA will be encoded as 10 00 010 and data from Fig.3 where codes of 8 symbols are shown in
SEA will be encoded as 011 00 010 [12]. These codes are the increasing order [13]. Let the input code be ‘01010’,
in the form of bits and space occupied for one symbol is then fifth [(1+8) / 2 = 5] entry ‘01101’ is tested. Since
reduced which in general take 8 bits. An ensemble X is a ‘01010’ is smaller than ‘01101’ so the third [(1+4)/2=3]
triple (x, A , P ) entry is tested. This process is continued until the given
x x
x: value of a random variable code is found by reducing the list into half of its size in
A: set of possible values for x , A ={a , a , …, a} each iteration or the size of the list becomes only one.
x x 1 2 i
P : probability for each value , P ={p , p , …, p}
x x 1 2 i 3. PROPOSED METHOD
where P(x) = P(x = a) = p, p > 0, and pi 1
i i i
Shannon information content of x is shown in equation 3.1 Analysis
(1) In the proposed work, we are using multi-key LSB
h(x) = log2(1/P(x)) (1) substitution method in which we are compartmentalizing
and entropy of x is stated in equation (2) the pixels into RGB components to make it more secure
H(x) P(x).log 1 (2) and every pixel is capable of hiding one character so
p(x) embedding capacity varied linearly, that is
x Ax
and the expected length L(C,X) of symbol code C for X is N(c)N(p) (4)
stated in equation (3) which achieves as much where N(c) is the number of characters that can be
compressiion as possible – embedded and N(p) is the number of pixels of the image.
We start by accumulating all the pixels of the cover image
Volume 2, Issue 4 July – August 2013 Page 74
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 4, July – August 2013 ISSN 2278-6856
in an array P. In our proposed algorithm, we are without causing any distortion to the cover image and use
deploying four keys for embedding and extraction. KEY1 of multi keys make it more secure.
(k11, k12, k13) which is a composite key and is used for 3.2 Algorithm (Embedding Process)
selecting the bits of the pixel components and KEY2 1. Store all the different characters of a file and their
(k21, k22, k23) is used for deciding the pattern of the bits frequency in a two dimensional array T1 in the
of the characters comprising the secret message. Two increasing order of frequency and accumulate all the
keys KEY3 and KEY4 are formulated based on the size pixels of the image in one dimensional array P of size
and texture of the cover image to fix the region for M*N where M×N is the size of the image.
embedding and to decide the gap between the pixels if 2. Construct the Huffman binary tree T by merging the
needed. elements from left to right until all the elements of the
To increase the embedding capacity and to make it array T1 are exhausted using the Huffman coding
further secure we incorporated the Huffman compression algorithm.
technique. We assumed that the secret message which is 3. Generate the Huffman codes for every character in the
to be embedded may vary from a few lines to thousands of array T1 using the Huffman binary tree T and store
lines. If message length is very large then Huffman them in another array T2 along with the code length.
compression technique plays a vital role in increasing the 4. Calculate the keys KEY3 and KEY4 by incorporating
embedding capacity and security. We used the variable the size and texture of the cover image and length of
length code to make it further efficient. If T is a tree the secret message. KEY3 fixes the starting position of
corresponding to a prefix code then number of bits the cover image and KEY4 fixes the gap between two
required to encode a file is pixels.
B(T) f (c)dT(c) (5)
5. Set the variable MaxBits to 9.
c C 6. Read each code from the array T2 one by one, find its
where ƒ(c) denote the frequency of a character c and d (c)
T length ‘Len’ and repeat steps from 6 to 10.
denote the depth of c’s leaf in tree T and is also the length 7. Repeat steps 8 and 9 until (Len ≥ MaxBits).
of the codeword for character c [14]. 8. Select a pixel using keys KEY3 and KEY4 and
After calculating the probabilities of different symbols bifurcate it into its components and embed first nine
that are occurring in the secret message, Huffman codes bits of the code, 3 bits in each RGB components.
are obtained by generating the prefix-free codes and 9. Reduce the length of the code by MaxBits that have
placing the symbols at the leaves of a code tree. This been embedded.
helps in obtaining the instantaneously decodable code of 10. Embed the remaining bits of the code in next pixel
minimal expected length L satisfying the equation using the keys KEY1 and KEY2.
11. Rejoin all the pixels of the array P to form the stego-
H(S) L H(S) 1 (6) image
where H(S) is the entropy of a source S or the expected
information of a symbol [15], [16]. If source S is emitting 3.3 Algorithm (Extraction Process)
one of symbols s , s , …, s at each time with probabilities 1. Accumulate pixels of the stego-image in the byte array
1 2 n
p1, p2, …, pn respectively independent of the symbols T and extract the keys KEY1, KEY2, KEY3 and
emitted at other times then in a very long string of K KEY4.
emissions we expect to get Kp , Kp , …, Kp instances of 2. Read the length of the code from the input array T2.
1 2 n
the symbols s , s , …, s respectively. These emitted 3. Pick up pixels from the array T using KEY3 and
1 2 n
symbols are independent and identically distributed. This KEY4.
is supported by the law of large numbers. If the expected 4. Extract the bits from the pixels using keys KEY1 and
or mean number of occurrences of symbol s in K
1 KEY2 to generate the code.
independent repetitions is Kp where p is the probability
1 1 5. Decode the code to get the symbol using decoding
of getting s in a single trial then standard deviation
1 process discussed in 2.2 and append the symbol in
around this mean is sqrt{Kp1(1-p1)}. Therefore, fractional String S.
one-std spread around the mean is sqrt{(1-p ) ∕ (Kp ), that
1 1 6. Repeat steps 2 to 5 until all the elements of array T2
is, for large K, the number of occurrences of s is
1 are exhausted.
relatively tightly concentrated around the mean value of 7. The string S is the extracted message.
Kp [15]. In our proposed method, every pixel of the
1
cover image can be embedded and to make it distortion 4. EXPERIMENTAL RESULTS
free, embedding is done in all the components of the We selected and categorized data into three types, that is,
pixel. Since data is compressed before embedding micro (message length less than 100 characters), macro
therefore a message of very long length can be embedded (message length of more than 1000 characters) and
Volume 2, Issue 4 July – August 2013 Page 75
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 4, July – August 2013 ISSN 2278-6856
texture (message from a particular domain) and tested our For normal messages, embedding capacity is almost
results on 50 different digital colored images out of which doubled as can be seen in table 1 where compression ratio
5 results are shown. Different types of images are chosen varies from 47% to 52% and it gives very significant
as the cover and robustness is tested by selecting results for small messages where compression ratio varies
contiguous pixels of any region of the image to embed the from 75% to 85% and these compression ratios totally
message. Images img1c, img2c, img3c, img4c, img5c are depend on the frequencies of the symbols occurring in the
the cover images and img1s, img2s, img3s, img4s, img5s message. Compression through Huffman coding not only
are the stego-images. Micro message of length 92 is increases the embedding capacity but security is also
embedded in image ‘img1c’ in Fig. 4(a) and increased as the symbols are embedded in the form of
corresponding stego-image ‘img1s’ is shown in Fig. 4(b). codes. Comparative results in peak position for macro and
Macro message of length 1537 and 1687 is embedded in texture type messages are shown in table 2.
‘img2c’ in Fig. 5(a) and ‘img3c’ in Fig. 6(a) respectively
and corresponding stego-images ‘img2s’ and ‘img3s’ are Result Images
shown in Fig. 5(b) and Fig. 6(b). Similarly, data of same
texture of length 1503 and 1714 are embedded in ‘img4c’
and ‘img5c’ whose stego-images are ‘img4s’ and ‘img5s’
that are shown in Fig. 7(a), 7(b), 8(a), 8(b) respectively.
Compression ratio by Huffman coding depends on the
length and nature of data that is why we took different
samples of three types of data whose result is shown in Figure 4(a) img1c (Cover image)
Table 1.
Maximum performance given by our algorithm is Table 2: Comparative Performance Measurements
compared with the algorithms proposed by Mahmud
Hasan et al. [10] and our previous designed algorithm
which clearly show that our results have been improved
as shown in Table 2. Mahmud Hasan et al. [10]
compromised with only 32 bits out of 128 bits after
performing calculations in their block processing
approach and embedded characters of 7 bits. On an
average, 4.57 characters can be embedded in 32 bits and
considering the complete block of 4×4, 3.50 pixels are
required to embed one character. In our previous
algorithm, we stored 8 bits of a character per pixel by
compartmentalizing it into its components and in no case
failure occurred. In the proposed method, maximum 9
bits can be stored per pixel but it is observed that very few
symbols are obtained whose code length is greater than 9. Figure 4 (b) img1s (Stego image)
In general, code length varies from 4 to 7 for macro and
texture type of messages and for micro message, code
length varies from 1 to 4.
Table 1: Results of Compression on Different type of data
Message Message Number No. of
Type Length of Bits Compression Bits
(in per Ratio Embedde
chars) Symbol d
Micro 12 1.67 82.46 20
Micro 92 1.83 Figure 5 (a) img2c (Cover image)
77.17 168
Macro 1537 4.22 47.16 6497
Macro 1687 4.24 47.00 7152
Texture 1503 3.97 50.43 5960
Texture 1714 3.80 52.52 6510
Figure 5 (b) img2s (Stego image)
Volume 2, Issue 4 July – August 2013 Page 76
no reviews yet
Please Login to review.