Transfer Learning

I have seen berak’s work on using transfer learning with squeezenet. My ideal network would have 16 input neurons and 8 output neurons. But, the input gets converted to 227 x 227. Would there be a better layer / network for my purpose?

Thanks for your time.

maybe you can try to explain, what your prospective NN is for ?
what are you trying to solve, what would be the inputs / outputs here ?

transfer learning only works, if you’re in the same domain as the pretrained model (e.g. image classification),
so the cats & dogs example using squeezenet (which was trained on imagenet, already containing those classes) was a quite “low hanging fruit”

btw, welcome back !

1 Like

Hey thanks, it’s great to get in touch with you.

Perhaps I am going about all of this all wrong. What I’m doing is trying to teach the network the multiplication of unit octonions. I have 2 x 8 input floats per multiplication, and 1x8 output floats.

P.S. I tried the standard ANN, but it didn’t produce acceptable results – GitHub - sjhalayka/ann_trad_mul

1 Like

i took a look at your ann approach, few thoughts:

  • needs more / deeper layers, e.g. [16, 256, 128, 8]
  • this is a regression problem, not classification
    (you want the output as close as possible to the trad_mul)
    so sigmoid is the wrong activation function, as it stretches values to large pos/neg numbers. try LINEAR instead (sadly you cannot set this per layer here…)
  • your train input is some grid, but the test input is random, hmmm
    (i’d go for all random, so you have an “infinite” amount of train data)
  • do this in batches, using the update flag. train a few thousand iterations with ~1000 random octonian pairs, test with a few hundred, distance between test results and trad_mul should give you a loss value.
    rinse & repeat, loss should go down

then, wikipedia had a nice link to this paper
Deep octonion networks are a thing, hehe !
(not that i understood any of it …)
but if you scroll down to fig.1, you can see, that each of the 8 octonion components gets its own 2d input map !
(again, all of it for a very different purpose, and i dont think, you can use any of it for your multiplication problem)

1 Like

well forget all advice above ;(
i made some own attempt at it, and could not get anything useful out of the ANN_MLP

problem is: the last layer should have IDENTITY activation, but you can set this for ALL layers only.
with IDENTITY, it predicts nan’s (BACKPROP) or just does not learn anything (RPROP,ANNEAL)
with SIGMOID_SYM, it predicts wrong output values.

imho, opencv’s ANN_MLP is the wrong tool for this problem
(there is also no regression test here, so noone thought of it)

if you still want to try a nn for this, rather move on to pytorch (maybe even on colab) or such, where you have better control over the network

1 Like

Thanks again berak.

My try at the ANN is at GitHub - sjhalayka/ann_trad_mul – doesn’t work well at all, as you have confirmed.

I read somewhere that you can use TensorFlow with C++. Sounds like fun!

OK, so I’ve resigned myself to using Python. LOL I’m checking out PyTorch. Thanks for the pointer.

hey, still with us ?
couldn’t leave it alone, and had to try my own dogfood, hehe :wink:
i could get the mse < 0.005, results are awesome:

[[-0.26470596 -1.1059655   0.8352055   0.52348846  0.10831118  0.61087406
   0.877185   -0.2339818 ]
 [ 0.5304312   0.28641003 -0.23038895  1.7396963   0.9543278  -0.15201017
  -0.32512766 -0.63941836]
 [ 1.271717   -1.0696023  -1.7634459   1.3812865   0.7645521   0.86197543
  -0.11058927  0.7089186 ]
 [ 0.05578786 -0.5496834   0.08307987  0.47892642  1.1344438   0.8008164
   1.7119472  -0.711021  ]
 [-0.01412529  0.16617292  0.31411836 -0.54852575 -0.09500442 -0.5103319
   1.1987839   1.5580554 ]
 [-1.1071944   0.19642657 -2.0996773  -1.3792093   0.02593207  0.48899403
  -0.95860386 -0.69786584]
 [ 0.06335697  1.8394524   1.2040088  -0.7028511  -1.4141586   1.3727264
  -1.3220475  -0.92453116]
 [-0.9785156  -0.1938      0.93648046 -1.1895093  -0.25253427  1.1505108
   0.07079685 -0.24422744]
 [ 0.9060701   0.47642064 -0.6221065   0.23977897 -0.5096241   0.03327567
   0.9862392  -0.30424342]
 [-0.24386582 -0.39277518 -0.09294297 -1.441422    0.66120946  0.24129298
   0.24979496 -0.66661644]]
[[-0.22463474 -1.0463736   0.7836859   0.52826786  0.08230602  0.6770275
   0.86740434 -0.14845707]
 [ 0.53898156  0.27856505 -0.25398877  1.7475549   0.9918187  -0.10367861
  -0.40210718 -0.65490824]
 [ 1.256039   -0.9462424  -1.6523921   1.2664773   0.72396016  0.8815708
  -0.1294247   0.6481081 ]
 [ 0.0697846  -0.50714713  0.04150326  0.49762183  1.1242275   0.90655166
   1.644487   -0.6425128 ]
 [ 0.00779554  0.23888205  0.25112483 -0.51644474 -0.10923553 -0.4203535
   1.1525563   1.5756216 ]
 [-1.2055006   0.25920594 -2.1307118  -1.3793143   0.06139605  0.56379914
  -1.0755298  -0.5691055 ]
 [ 0.09848204  1.8555393   1.1761844  -0.7658939  -1.3464015   1.3967674
  -1.3304011  -0.78789973]
 [-1.0429541  -0.13010623  0.92357194 -1.204276   -0.24126603  1.243408
   0.01066075 -0.1784872 ]
 [ 0.9119352   0.53681195 -0.6374782   0.25338084 -0.5011101   0.13597292
   0.9528497  -0.25991094]
 [-0.28418157 -0.2987693  -0.08471005 -1.3918653   0.6323257   0.29227516
   0.160787   -0.525723  ]]

but now, mate, you owe me an explanation:
wth is this all for ? what’s the purpose of approximating something you can calculate exactly ?

You, sir, are a scholar and a gentleman. I thank you so much for all of your efforts. I’d still be in the stone age if it weren’t for you.

Well, what I am wondering is whether or not the network can have a time complexity that is less than the time complexity of the exact traditional multiplication function. For instance, for the traditional multiplication, it consists of 8 x 8 multiplications and just about the same amount of additions/subtractions. Now I will try it on pathions, which have a time complexity of O(32 x 32).

Your results are spectacular! Thank you!!!

The imagery associated with this project is as follows. It is a triangle mesh, and it’s rendered using OpenGL 4, Phong shading, and a shadow map. The mesh is cut in half, to show “internal” detail.

1 Like

I needed to normalize the octonion, so I’m doing this:
I optimized it a little. It’s a bit faster.

def normalize_batch(batch):

  for i in range(batch.shape[0]):

    batch[i,0:num_components] /= math.sqrt([i,0:num_components], batch[i,0:num_components]));
    batch[i,num_components:num_components*2] /= math.sqrt([i,num_components:num_components*2], batch[i,num_components:num_components*2]));

  return batch;

Is there a better way?

P.S. The whole code is at: GitHub - sjhalayka/ann_quat_fractal

Just want to say thanks again! This code works very well!

Good news. I used a replacement function for the traditional multiplication, and for pathions (num_components = 32), the timing is 40 seconds for traditional multiplication versus 12 seconds for the trained multiplication.

For reference, for octonions (num_components = 8), the timing is 10 seconds versus 10 seconds.

I’d say that it’s a success.

P.S. I wonder… is there a way to use non-normalized number types?

P.P.S. Is there any way to accelerate this entire script on the GPU?

def traditional_mul_replacement(in_a, in_b):

  out = np.zeros([num_components], np.float32)
  answer = 0
  for i in range(num_components):
    for j in range(num_components):
      answer += in_a[i] * in_b[j];

  return out.T;


Thanks again berak!

The results speak for themselves. On the left we have the AI-generated multiplication, and on the right is the ground truth multiplication.

I honestly didn’t think that it would work so well.

1 Like

that’s entirely unlikely to me.
to model unknown complexity, you usually have to come up with a far larger beast, than the original, exact solution
if you still get any gains from nn’s it’s probably due to better vectorization / optimization there (again, opinion !)

hey, i just looked and found, you’re doing all of it on vertices ?
(still marching on cubes, hey…)

for python, there are various gpu accelerators (e.g. numba). pytorch can also run on gpu

but well, i remember, running julia fractals from a pixel/fragment shader a decade ago… seems to be defunct today, but i owe iq almost anything i know about 3d

pps: mandatory image:

1 Like

For quaternions both methods are equal in speed. This leads me to suspect that the AI method is performed on the GPU. For pathions, the AI method is faster by a factor of like 6.

I do like Marching Cubes LOL. The only drawback to the Python version is that it doesn’t perform iteration while calculating the vertex positions. I mean, it just does linear interpolation, which leads to blocky results. :frowning: I have incentive to port my Marching Cubes code (based on Paul Bourke’s work) to Python LOL

Anyway. Thanks once again man!

Just curious if you’d take another peek at the code, to see if there are any loops that can be avoided by using some Python slicing or whatnot?

Right now it runs soooooo slow. I also notice that if I set device=“cpu”, that it runs faster than on the GPU. (???)

sure, if you wanted the input batch like this:

|                     |                       |
|                     |                       |
|                     |                       |
|                     |                       |
|    slice_a          |   slice_b             |
|                     |                       |
|                     |                       |
|                     |                       |
|                     |                       |

the whole get_predictions() could be simplified to:

def get_predictions():
    batch =,float_slice_b), 1)
    return net(batch).cpu().detach().numpy();

this loop looks like a few linspaces

and you never use the float_slices here or do you ? (also Z)

up/downloads between cpu <–> gpu are costly.
anytime you do a .to(device) or .cpu() it’s doing a full copy
(in gpu mode, but all of this is a no-op in cpu mode)
so, if you want to profit from a gpu, keep your data there as long as you can, and try to express the ops using torch functions

1 Like

Allow me to thank you once again. I shall meditate upon what you’ve shown me.

P.S. Yes, linspace!!! I see now.

Hi berak, I uploaded more scripts. and both generate obj files. However, I notice that using scikit marching cubes generates meshes that are scaled and translated. I am disappointed. I will be writing my own marching cubes API for Python.

P.S. I got rid of slice a and b altogether, avoiding the call to — thank you again for all of your help, man!

1 Like

Pytorch C++ is a thing! Using the PyTorch C++ Frontend — PyTorch Tutorials 1.11.0+cu102 documentation

I’m not terribly familiar with the lambda expressions in C++ and Python, and I’m a slow learner. Can one skip a for loop using a lambda expression? Sorry if that question doesn’t make sense.